Skip to content

Pandas Helper Library for reading and writing DataFrames from and to HBase.

License

Notifications You must be signed in to change notification settings

lyveng/pandas-hbase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

====================== Pandas HBase IO Helper

Persist pandas DataFrame objects to HBase and read them back later.

Pre-requisites

  • Hbase Thrift server running in 127.0.0.1:9090
  • Hbase table sample_table created with column family cf

Known Issues:

  • Works only with DataFrames that have integer indices.
  • DataFrames to be persisted should not have ':' in column names

Writing DataFrame to HBase

Establish hbase connection using happybase and write the dataframe.

    import happybase
    import numpy as np
    import pandas as pd
    import pdhbase as pdh
    connection = None
    try:
        connection = happybase.Connection('127.0.0.1')
        connection.open()
        df = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
        df['f'] = 'hello world'
        pdh.to_hbase(df, connection, 'sample_table', 'df_key', cf='cf')
    finally:
        if connection:
            connection.close()

Reading DataFrame from HBase

Establish hbase connection using happybase and read the dataframe.

import happybase
import numpy as np
import pandas as pd
import pdhbase as pdh
connection = None
try:
    connection = happybase.Connection('127.0.0.1')
    connection.open()
    df = pdh.read_hbase(connection, 'sample_table', 'df_key', cf='cf')
    print df
finally:
    if connection:
        connection.close()

About

Pandas Helper Library for reading and writing DataFrames from and to HBase.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages