Skip to content
MATLAB Interface for Apache Parquet
MATLAB HTML Java
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Documentation
Software
.gitignore
LICENSE.TXT
README.md
RELEASENOTES.md

README.md

MATLAB Interface for Apache Parquet

Introduction

Apache™ Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.

The MATLAB interface for Apache Parquet provides for reading and writing of Apache Parquet files from within MATLAB. Functionality includes:

  • Read and write of local Parquet files
  • Access to meta data of a Parquet file
  • A MATLAB Datastore for reading Parquet files

Requirements

MathWorks Products (http://www.mathworks.com)

  • Requires MATLAB release R2017b or newer

3rd Party Products:

For building the JAR file, please make sure the following products are already installed (or install & downlaod from provided links):

Apache Hadoop installation and configuration

Linux/MacOS

Download & unzip binaries from Apache Hadoop official website to a local folder.

Microsoft® Windows®

On Windows, a compatible utility version called winutils.exe can be downloaded from https://github.com/steveloughran/winutils/raw/master/hadoop-2.8.3/bin/winutils.exe. After download, we would recommend placing the executable under <repo_root>\Software\MATLAB\lib\hadoop\bin\winutils.exe

Note that you will need to first manually create the lib\hadoop\bin folders

More detailed information on Windows install can be found here.

Installation

Installation of the interface requires building the support package (Jar file) and setting the environment variable value for HADOOP_HOME. Before proceeding:

  • Install Java SDK and Maven.
  • Clone repository or download + unzip/tar latest sources release.
  • Create/Set HADOOP_HOME environment variable to point to Apache™ Hadoop® installation local folder (Linux/MacOS) or to the folder where winutils.exe executable is located (as suggested/explained below) (Windows)

The links to download these products are provided in the section 3rd party products.

To set the environment variable, please follow rules for your operating system. Please note, that this environment variable must be set prior to starting MATLAB. Changing the environment variable from within MATLAB will not have the desired effect.

Build the Jar file

To install the interface, you must first build the Jar file.

cd <this_repo>
cd Software/Java
mvn clean package

Install & Verify MATLAB package

Now you can open MATLAB and install the support package.

cd <this_repo>/Software
install

Restart MATLAB, and verify installation: Windows

parquetwin('verify')

In case of issues, please refer to the following documentation. Otherwise, you're good to go.

Linux

parquettools('meta')

Usage

To write a variable to a Parquet file:

A = magic(5);
parquetwrite('m5.parquet', A);

and you can read the same file with

B = parquetread('m5.parquet');

A few unit tests can be run with

results = runParquetTests()

For more details, look at the Basic Usage document.

Documentation

See documentation for more information.

License

The license for MATLAB interface for Parquet is available in the LICENSE.TXT file in this GitHub repository. This package uses certain third-party content which is licensed under separate license agreements. See the pom.xml file for third-party software downloaded at build time.

Enhancement Request

Provide suggestions for additional features or capabilities using the following link:
https://www.mathworks.com/products/reference-architectures/request-new-reference-architectures.html

Support

Email: mwlab@mathworks.com


You can’t perform that action at this time.