# Chapter 6

# Data Loading, Storage, and File Formats

Acesing data ! 

Input and Output Typically falls into a few main categories (with exceptions):
- Reading text files
- Reading more efficient on-disk formats
- loading data from databases
- Interacting with network sources like web-APIs 

## Reading and Writing Data in Text Format

| Function | Description | 
| --------- | ---------- |
| read_csv | Load delimited data from a file, URL, or file-like object; use comma as default delimiter |
| read_table | Load delimited data from a file, URL, or file-like object; use tab ('\t') as default delimiter |
| read_fwf | Read data in fixed-width column format (i.e., no delimiters) |
| read_clipboard | Version of read_table that reads data from the clipboard; useful for converting tables from web pages |
| read_excel | Read tabular data from an Excel XLS or XLSX file |
| read_hdf | Read HDF5 files written by pandas |
| read_html | Read all tables found in the given HTML document |
| read_json | Read data from a JSON (JavaScript Object Notation) string representation |
| read_msgpack | Read pandas data encoded using the MessagePack binary format |
| read_pickle | Read an arbitrary object stored in Python pickle format |
| read_sas | Read a SAS dataset stored in one of the SAS system’s custom storage formats |
| read_sql | Read the results of a SQL query (using SQLAlchemy) as a pandas DataFrame |
| read_data | Generalized format for a custom repacking for different files (maybe .sld or .module?) |
| read_sdata | Read a dataset from Stata file format |
| read_feather | Read the Feather binary file format |

There are a variety of mechanics within these `read_` functions. They are designed to convert text data into a DataFrame. The Optional Arguments for these functions fall into several categories
### Indexing
- Can treat one or more columns as the returned DataFrame, and whether to get column names from the file, the user or not at all

### Type Inference and Data Conversion
- This includes the user-defined value conversions and custom list of missing value markers.

### Datatime parsing
- Includes combining capability, including combining data and time information spread over multiple columns into a single resultant column

### Iterating
- Support for iterating over chunks of very large files

### Unclean data issues
- Skipping rows or a footer, comments, or other minor things like numerical data with thousands separated by commas. 

Some notes:
- `read_csv` has a lot of complex options due to how the format has evolved over time. (There are over 50 parameters)
- See the [documentation](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html)
- `read_csv` and some others perform *type inference* because of the column data types being stored separately to the data format. You may not always need to specific which columns are of a particular data type. 
- Some data types like HDF5, Feather, and msgpack have the datatype stored in the format.
- Handling dates and other custom types can require extra effort to fixed, especially within certain datatypes.

In [9]:
import pandas as pd
import numpy as np

In [5]:
!cat examples/ex1.csv
# goes into console to check contents of file. 
# Be careful doing this with large files

a,b,c,d,message
1,2,3,4,hello
5,6,7,8,world
9,10,11,12,foo

In [11]:
# reading examples csv into a DataFrame
df = pd.read_csv('examples/ex1.csv')
df

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


In [12]:
pd.read_table('examples/ex1.csv', sep=',')

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


In [13]:
!cat examples/ex2.csv

1,2,3,4,hello
5,6,7,8,world
9,10,11,12,foo

In [14]:
pd.read_csv('examples/ex2.csv', header=None)

Unnamed: 0,0,1,2,3,4
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


In [15]:
pd.read_csv('examples/ex2.csv', names=['a','b','c','d','message'])

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


In [16]:
names = ['a','b','c','d','message']

In [17]:
pd.read_csv('examples/ex2.csv', names=names, index_col='message')

Unnamed: 0_level_0,a,b,c,d
message,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
hello,1,2,3,4
world,5,6,7,8
foo,9,10,11,12


In [18]:
!cat examples/csv_mindex.csv

key1,key2,value1,value2
one,a,1,2
one,b,3,4
one,c,5,6
one,d,7,8
two,a,9,10
two,b,11,12
two,c,13,14
two,d,15,16


In [20]:
parsed = pd.read_csv('examples/csv_mindex.csv', 
                    index_col=['key1', 'key2'])
parsed

Unnamed: 0_level_0,Unnamed: 1_level_0,value1,value2
key1,key2,Unnamed: 2_level_1,Unnamed: 3_level_1
one,a,1,2
one,b,3,4
one,c,5,6
one,d,7,8
two,a,9,10
two,b,11,12
two,c,13,14
two,d,15,16


In [21]:
list(open('examples/ex3.txt'))

['            A         B         C\n',
 'aaa -0.264438 -1.026059 -0.619500\n',
 'bbb  0.927272  0.302904 -0.032399\n',
 'ccc -0.264273 -0.386314 -0.217601\n',
 'ddd -0.871858 -0.348382  1.100491\n']

In [25]:
result = pd.read_table('examples/ex3.txt', sep='\s+')
result

Unnamed: 0,A,B,C
aaa,-0.264438,-1.026059,-0.6195
bbb,0.927272,0.302904,-0.032399
ccc,-0.264273,-0.386314,-0.217601
ddd,-0.871858,-0.348382,1.100491


In [26]:
!cat examples/ex4.csv

# hey!
a,b,c,d,message
# just wanted to make things more difficult for you
# who reads CSV files with computers, anyway?
1,2,3,4,hello
5,6,7,8,world
9,10,11,12,foo

In [27]:
pd.read_csv('examples/ex4.csv', skiprows=[0, 2, 3])

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


In [28]:
!cat examples/ex5.csv

something,a,b,c,d,message
one,1,2,3,4,NA
two,5,6,,8,world
three,9,10,11,12,foo

In [29]:
result

Unnamed: 0,A,B,C
aaa,-0.264438,-1.026059,-0.6195
bbb,0.927272,0.302904,-0.032399
ccc,-0.264273,-0.386314,-0.217601
ddd,-0.871858,-0.348382,1.100491


In [30]:
pd.isnull(result)

Unnamed: 0,A,B,C
aaa,False,False,False
bbb,False,False,False
ccc,False,False,False
ddd,False,False,False


In [32]:
result = pd.read_csv('examples/ex5.csv', na_values=['NULL'])
result

Unnamed: 0,something,a,b,c,d,message
0,one,1,2,3.0,4,
1,two,5,6,,8,world
2,three,9,10,11.0,12,foo


In [33]:
sentinels = {'message': ['foo', 'Na'], 'something': ['two']}

In [34]:
pd.read_csv('examples/ex5.csv', na_values=sentinels)

Unnamed: 0,something,a,b,c,d,message
0,one,1,2,3.0,4,
1,,5,6,,8,world
2,three,9,10,11.0,12,


| Argument | Description | 
| -------- | ----------- | 
| path | String indicating filesystem location, URL, or file-like object | 
| sep or delimiter | Character sequence or regular expression to use to split fields in each row | 
| header | Row number to use as column names; defaults to 0 (first row), but should be None if there is no header row | 
| index_col | Column numbers or names to use as the row index in the result; can be a single name/number or a list of them for a hierarchical index | 
| names | List of column names for result, combine with header=None | 
| skiprows | Number of rows at beginning of file to ignore or list of row numbers (starting from 0) to skip. | 
| na_values | Sequence of values to replace with NA. | 
| comment | Character(s) to split comments off the end of lines. | 
| parse_dates | Attempt to parse data to datetime; False by default. If True, will attempt to parse all columns. Otherwise can specify a list of column numbers or name to parse. If element of list is tuple or list, will combine multiple columns together and parse to date (e.g., if date/time split across two columns). | 
| keep_date_col | If joining columns to parse date, keep the joined columns;Falseby default. | 
| converters | Dict containing column number of name mapping to functions (e.g., {'foo': f} would apply the
functionfto all values in the'foo'column). | 
| dayfirst | When parsing potentially ambiguous dates, treat as international format (e.g., 7/6/2012 -> June 7, 2012); False by default. | 
| date_parser | Function to use to parse dates. | 
| nrows | Number of rows to read from beginning of file. | 
| iterator | Return a TextParser object for reading file piecemeal. | 
| chunksize | For iteration, size of file chunks. | 
| skip_footer | Number of lines to ignore at end of file. | 
| verbose | Print various parser output information, like the number of missing values placed in non-numeric columns. | 
| encoding | Text encoding for Unicode (e.g., 'utf-8' for UTF-8 encoded text). | 
| squeeze | If the parsed data only contains one column, return a Series. | 
| thousands | Separator for thousands (e.g., ',' or '.'). | 
| -------- | ----------- |

## Reading Text Files in Pieces

When processing very large files or figuring out the right set of arguments to correctly process a very large file, you may only want to read in a small piece of a file or iterate through smaller chunks of the file.

Before we look at a large, file we make the pandas display settings more compact:

In [35]:
pd.options.display.max_rows = 10

In [36]:
result = pd.read_csv('examples/ex6.csv')
result

Unnamed: 0,one,two,three,four,key
0,0.467976,-0.038649,-0.295344,-1.824726,L
1,-0.358893,1.404453,0.704965,-0.200638,B
2,-0.501840,0.659254,-0.421691,-0.057688,G
3,0.204886,1.074134,1.388361,-0.982404,R
4,0.354628,-0.133116,0.283763,-0.837063,Q
...,...,...,...,...,...
9995,2.311896,-0.417070,-1.409599,-0.515821,L
9996,-0.479893,-0.650419,0.745152,-0.646038,E
9997,0.523331,0.787112,0.486066,1.093156,K
9998,-0.362559,0.598894,-1.843201,0.887292,G


In [37]:
pd.read_csv('examples/ex6.csv', nrows=5)

Unnamed: 0,one,two,three,four,key
0,0.467976,-0.038649,-0.295344,-1.824726,L
1,-0.358893,1.404453,0.704965,-0.200638,B
2,-0.50184,0.659254,-0.421691,-0.057688,G
3,0.204886,1.074134,1.388361,-0.982404,R
4,0.354628,-0.133116,0.283763,-0.837063,Q


In [38]:
chunker = pd.read_csv('examples/ex6.csv', chunksize=1000)
chunker
# returns an object that allows you to iterate over parts of the file according to the spec'ed chunksize

<pandas.io.parsers.readers.TextFileReader at 0x10ea35ba0>

In [41]:
chunker = pd.read_csv('examples/ex6.csv', chunksize=1000)

tot = pd.Series([]) 
# need to specify float64 instead of object as Series default dtype will be Series in future release
for piece in chunker:
    tot = tot.add(piece['key'].value_counts(), fill_value=0)
    
tot = tot.sort_values(ascending=False)

tot[:10]

  tot = pd.Series([])


E    368.0
X    364.0
L    346.0
O    343.0
Q    340.0
M    338.0
J    337.0
F    335.0
K    334.0
H    330.0
dtype: float64

`TextParser` is also equipped with a `get_chunk` method that enables you to read pieces of an arbitrary size

## Writing Data to Text Format
Data can also be exported to a delimited format. 

In [42]:
data = pd.read_csv('examples/ex5.csv')
data

Unnamed: 0,something,a,b,c,d,message
0,one,1,2,3.0,4,
1,two,5,6,,8,world
2,three,9,10,11.0,12,foo


In [43]:
data.to_csv('examples/out.csv')

In [44]:
!cat examples/out.csv

,something,a,b,c,d,message
0,one,1,2,3.0,4,
1,two,5,6,,8,world
2,three,9,10,11.0,12,foo


In [45]:
import sys

data.to_csv(sys.stdout, sep='|')

|something|a|b|c|d|message
0|one|1|2|3.0|4|
1|two|5|6||8|world
2|three|9|10|11.0|12|foo


In [46]:
data.to_csv(sys.stdout, na_rep='NULL')

,something,a,b,c,d,message
0,one,1,2,3.0,4,NULL
1,two,5,6,NULL,8,world
2,three,9,10,11.0,12,foo


In [47]:
data.to_csv(sys.stdout, index=False, header=False)

one,1,2,3.0,4,
two,5,6,,8,world
three,9,10,11.0,12,foo


In [48]:
data.to_csv(sys.stdout, index=False, columns=['a','b','c'])

a,b,c
1,2,3.0
5,6,
9,10,11.0


In [49]:
dates = pd.date_range('1/1/2000', periods=7)

In [50]:
ts = pd.Series(np.arange(7), index=dates)

In [51]:
ts.to_csv('examples/tseries.csv')

In [52]:
!cat examples/tseries.csv

,0
2000-01-01,0
2000-01-02,1
2000-01-03,2
2000-01-04,3
2000-01-05,4
2000-01-06,5
2000-01-07,6


## Working With Delimited Formats
Its possible to load most forms of tabular dat from disk using functions like `pandas.read_table`.
In some cases, some manual processsing may still be necessary. It is not uncommon to receive a file with one or more malformed lines that trip up `read_table`. 

In [53]:
!cat examples/ex7.csv

"a","b","c"
"1","2","3"
"1","2","3"


In [66]:
import csv
f = open('examples/ex7.csv')

reader = csv.reader(f)

for line in reader:
    print(line)

['a', 'b', 'c']
['1', '2', '3']
['1', '2', '3']


In [67]:
with open('examples/ex7.csv') as f:
    lines = list(csv.reader(f))

In [68]:
header, values = lines[0], lines[1:]

In [69]:
data_dict = {h: v for h, v in zip(header, zip(*values))}

data_dict

{'a': ('1', '1'), 'b': ('2', '2'), 'c': ('3', '3')}

In [70]:
class my_dialect(csv.Dialect):
    lineterminator = '\n'
    delimiter = ';'
    quotechar = '"'
    quoting = csv.QUOTE_MINIMAL
    

In [71]:
reader = csv.reader(f, dialect=my_dialect)

ValueError: I/O operation on closed file.

In [72]:
reader = csv.reader(f, delimiter='|')

ValueError: I/O operation on closed file.

### csv dialect options

| Argument | Description |
| -------- | ----------- | 
| delimiter | One-character string to separate fields; defaults to ','. |
| lineterminator | Line terminator for writing; defaults to '\r\n'. Reader ignores this and recognizes cross-platform
line terminators |
| quotechar | Quote character for fields with special characters (like a delimiter); default is '"'. |
| quoting | Quoting convention. Options includecsv.QUOTE_ALL(quote all fields),csv.QUOTE_MINI MAL(only fields with special characters like the delimiter),csv.QUOTE_NONNUMERIC, and csv.QUOTE_NONE (no quoting). See Python’s documentation for full details. Defaults to QUOTE_MINIMAL. |
| skipinitialspace | Ignore whitespace after each delimiter; default is False. |
| doublequote | How to handle quoting character inside a field; if True, it is doubled (see online documentation for full detail and behavior). |
| escapechar | String to escape the delimiter if `quoting` is set to `csv.QUOTE_NONE;` disabled by default. |

For files with more complicated or fixed multicharacter delimiters, you will note be able to use the `csv` module. You'll have to do the line splitting and cleanup/wrangling using the string's split methods or use the regex expression method `re.split()`

## JSON Data
JSON (short for JavaScript Object Notation) has become one of the standard formats for sending data by HTTP request between web browsers and other applications. It is a much more free-form data format than a tabular text form like CSV. 

Example JSON object below

```JSON

obj = """
    {"name": "Wes",
     "places_lived": ["United States", "Spain", "Germany"],
     "pet": null,
     "siblings": [{"name": "Scott", "age": 30, "pets": ["Zeus", "Zuko"]},
                  {"name": "Katie", "age": 38,
                   "pets": ["Sixes", "Stache", "Cisco"]}]
} """

```

Json is quite nearly valid Python code with the exception of its null value `null` and other nuances. Such as disallowing trailing commas at the end of lists). 

The basic types are 
- objects (dicts)
- arrays (lists)
- strings
- numbers 
- booleans
- nulls

All of the keys within an object must be strings. There are several libraries for reading and writing JSON data. Most commonly used is `json`, the builtin library. 

To convert JSON string to Python form, use `json.loads`

In [73]:
import json

In [79]:
result = json.loads(examples/example.json)
result

NameError: name 'example' is not defined

In [80]:
asjson = json.dumps(result)

TypeError: Object of type DataFrame is not JSON serializable

In [None]:
siblings = pd.DataFrame(result['siblings'], columns=['name', 'age'])
siblings

In [82]:
!cat examples/example.json

[{"a": 1, "b": 2, "c": 3},
 {"a": 4, "b": 5, "c": 6},
 {"a": 7, "b": 8, "c": 9}]


In [83]:
data = pd.read_json('examples/example.json')

In [84]:
data

Unnamed: 0,a,b,c
0,1,2,3
1,4,5,6
2,7,8,9


In [85]:
print(data.to_json())

{"a":{"0":1,"1":4,"2":7},"b":{"0":2,"1":5,"2":8},"c":{"0":3,"1":6,"2":9}}


In [86]:
print(data.to_json(orient='records'))

[{"a":1,"b":2,"c":3},{"a":4,"b":5,"c":6},{"a":7,"b":8,"c":9}]


## XML and HTML: Web Scraping

Python has many libraries for reading and writing data in the ubiquitous HTML and XML formats. Example libraries include: 
- lxml
- Beautiful Soup
- html5lib

While lxml is comparitively much faster in general, the other libraries can better handle malformed HTML or XML files. Pandas has a built-in function, `read_html`, which uses libraries like lxml and Beautiful Soup to automatically parse tables out of HTML files as DataFrame objects. In order to use this properly, you must install some additional libraries used by `read_html`


In [90]:
!{sys.executable} -m pip install lxml beautifulsoup4 html5lib


Collecting html5lib
  Downloading html5lib-1.1-py2.py3-none-any.whl (112 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m112.2/112.2 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Installing collected packages: html5lib
Successfully installed html5lib-1.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.1.2[0m[39;49m -> [0m[32;49m22.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/opt/homebrew/Cellar/jupyterlab/3.4.3/libexec/bin/python3.10 -m pip install --upgrade pip[0m


the `pandas.read_html` function has a number of options, but by default it searches for and attempts to parse all tabular data contained within <table> tags. The results is a list of DataFrame objects

In [94]:
tables = pd.read_html('examples/fdic_failed_bank_list.html')

In [92]:
len(tables)

1

In [93]:
failures = tables[0]

In [95]:
failures.head()

Unnamed: 0,Bank Name,City,ST,CERT,Acquiring Institution,Closing Date,Updated Date
0,Allied Bank,Mulberry,AR,91,Today's Bank,"September 23, 2016","November 17, 2016"
1,The Woodbury Banking Company,Woodbury,GA,11297,United Bank,"August 19, 2016","November 17, 2016"
2,First CornerStone Bank,King of Prussia,PA,35312,First-Citizens Bank & Trust Company,"May 6, 2016","September 6, 2016"
3,Trust Company Bank,Memphis,TN,9956,The Bank of Fayette County,"April 29, 2016","September 6, 2016"
4,North Milwaukee State Bank,Milwaukee,WI,20364,First-Citizens Bank & Trust Company,"March 11, 2016","June 16, 2016"


In [96]:
close_timestamps = pd.to_datetime(failures['Closing Date'])

In [97]:
close_timestamps.dt.year.value_counts()

2010    157
2009    140
2011     92
2012     51
2008     25
       ... 
2004      4
2001      4
2007      3
2003      3
2000      2
Name: Closing Date, Length: 15, dtype: int64

### Parsing XML with lxml.objectify

XML (eXtensible Markup Language) is another common structured data format supporting hierarchial, nested data with metadata. Sometimes even books can be created from a series of large XML documents.

XML and HTML are structurally similar, but XML is more general. 

Using `lxml.objectify`, we can parse a file and get a reference to the root node of the XML file with `getroot`:

In [98]:
from lxml import objectify

In [99]:
path = 'examples/example.xml'
parsed = objectify.parse(open(path))
root = parsed.getroot()

FileNotFoundError: [Errno 2] No such file or directory: 'examples/example.xml'

`root.INDICATOR` returns a generator yielding each <INDICATOR> XML element. For each record, we can populate a dict of tag names (like `YTD_ACTUAL`) to data values (and can selectively exclude a few tags).

In [101]:
data = []

skip_fields = ['FIELDS_TO_BE_SKIPPED']

for elt in root.INDICATOR:
    el_data = {}
    for child in elt.getchildren():
        if child.tag in skip_fields:
            continue
        el_data[child.tag] = child.pyval
    data.append(el_data)

NameError: name 'root' is not defined

In [102]:
perf = pd.DataFrame(data)

In [103]:
perf.head()

XML data can get very complicated. Each tag can have its own assoicated metadata. Consider an HTML link TAG, which happens to be valid XML

```Python
{
    from io import StringIO
    tag = '<a href=""http://www.google.com">Google</a>'
    root = objectify.parse(StringIO(tag)).getroot()
}
```

This will allow access to any of the fields like `href` int the tag or link text:

In [104]:
root

NameError: name 'root' is not defined

In [105]:
root.get('href')
# returns `google.com`

NameError: name 'root' is not defined

In [106]:
root.text
# returns the tag name 'Google'

NameError: name 'root' is not defined

# Binary Data Formats

One of the easiest ways to store data is *serialization*. The `pickle` serialization is a built-in format to store data in binary format. Pandas objects all have a `to_pickle` method that writes the datat to disk in pickle format:

In [108]:
frame = pd.read_csv('examples/ex1.csv')
frame

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


In [109]:
frame.to_pickle('examples/frame_pickle')

In [111]:
pd.read_pickle('examples/frame_pickle')

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


`pickle` is only reccommended as a short term storage format. The problem is that it is hard to guarantee that the format will be stable over time; an object pickled today may not unpickle with a later version of a library. We have tried to maintain backward compati‐ bility when possible, but at some point in the future it may be nec‐ essary to “break” the pickle format.

Pandas has built-in support for two more binary data formats: HDF5 and Message-Pack. 

*bcolz* 
- A compressabnle column-oriented binary format based on the Blosc compression library
*Feather*
- A cross-language column-oriented file format designed with the R programming community; 
- Feather uses the Apache Arrow columnar memory format

### Using HDF5 Format
HDFT is a well-regarded file format intended for storing large quantities of scientific array data. It is available as a C libary, and it has interfaces available in many other languages, including Java, Julia, MATLAB, and Python. The "HDF" in HDF5 Stands for Hierarchial data format. Each HDF5 file can store multiple datasets and supporting metadata. Compared with simpler formats, HDF5 supports on-the-fly compression with a variety of compression modes, enabling data with repeated patterns to be stored more efficiently. HDF5 can be a good choice for working with very large data-sets that don't fit into memory, as you can efficiently read and write small sections of much larger arrays.

While it is possible to directly access HDF5 files using either PyTables or h5py libraries, pandas provides a high-level interface that simplifies storing Series and DataFrame object. The `HDFStore` class works like a dict and handles the low-level details:

In [115]:
!{sys.executable} -m pip install pytables 

[31mERROR: Could not find a version that satisfies the requirement pytables (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for pytables[0m[31m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.1.2[0m[39;49m -> [0m[32;49m22.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/opt/homebrew/Cellar/jupyterlab/3.4.3/libexec/bin/python3.10 -m pip install --upgrade pip[0m


In [112]:
frame = pd.DataFrame({'a': np.random.randn(100)})

In [114]:
store = pd.HDFStore('examples/mydata.h5')

ImportError: Missing optional dependency 'pytables'.  Use pip or conda to install pytables.

In [116]:
store['obj1'] = frame

UsageError: Unknown variable '[obj1]'


In [117]:
store['obj1_col'] = frame['a']

UsageError: Unknown variable '[obj1_col]'


In [118]:
store

Stored variables and their in-db values:


In [None]:
store['obj1']

In [None]:
store.put('obj2', frame, format='table')

In [None]:
store.select('obj2', where=['index >= 10 and index <= 15'])

In [119]:
store.close()

NameError: name 'store' is not defined

In [None]:
frame.to_hdf('mydata.h5', 'obj3', where=['index < 5'])