MATLAB Interface for Apache Parquet
Apache™ Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.
The MATLAB interface for Apache Parquet provides for reading and writing of Apache Parquet files from within MATLAB. Functionality includes:
- Read and write of local Parquet files
- Access to meta data of a Parquet file
- A MATLAB Datastore for reading Parquet files
- Requires MATLAB release R2017b or newer
3rd Party Products:
For building the JAR file, please make sure the following products are already installed (or install & downlaod from provided links):
Apache Hadoop installation and configuration
Download & unzip binaries from Apache Hadoop official website to a local folder.
On Windows, a compatible utility version called
winutils.exe can be downloaded from
After download, we would recommend placing the executable under
Note that you will need to first manually create the
More detailed information on Windows install can be found here.
Installation of the interface requires building the support package (Jar file) and setting the environment variable value for HADOOP_HOME. Before proceeding:
- Install Java SDK and Maven.
- Clone repository or download + unzip/tar latest sources release.
- Create/Set HADOOP_HOME environment variable to point to Apache™ Hadoop® installation local folder (Linux/MacOS) or to the folder where
winutils.exeexecutable is located (as suggested/explained below) (Windows)
The links to download these products are provided in the section 3rd party products.
To set the environment variable, please follow rules for your operating system. Please note, that this environment variable must be set prior to starting MATLAB. Changing the environment variable from within MATLAB will not have the desired effect.
Build the Jar file
To install the interface, you must first build the Jar file.
cd <this_repo> cd Software/Java mvn clean package
Install & Verify MATLAB package
Now you can open MATLAB and install the support package.
cd <this_repo>/Software install
Restart MATLAB, and verify installation: Windows
In case of issues, please refer to the following documentation. Otherwise, you're good to go.
To write a variable to a Parquet file:
A = magic(5); parquetwrite('m5.parquet', A);
and you can read the same file with
B = parquetread('m5.parquet');
A few unit tests can be run with
results = runParquetTests()
For more details, look at the Basic Usage document.
See documentation for more information.
The license for MATLAB interface for Parquet is available in the LICENSE.TXT file in this GitHub repository. This package uses certain third-party content which is licensed under separate license agreements. See the pom.xml file for third-party software downloaded at build time.
Provide suggestions for additional features or capabilities using the following link: