![alt text](pandas.png "Title")

In [2]:
import pandas as pd
import json

# SAS
import xport
import xport.v56

ModuleNotFoundError: No module named 'xport'

As we saw, data manipulation in Python is usually achieved with Pandas, a popular library for Data Science and Analysis. The Pandas dataframe is the preferred object used by programmers to manipulate data. These dataframes cannot be stored as persistent files as such, therefore programmers must choose the data storage format. 

There are many options.

# Export dataframes

## Test data

In [3]:
data = {
    'subjid': [10010, 10011, 10012],
    'gender': ['M', 'F', 'F'],
    'age':    [20, 25, 23],
}

dm = pd.DataFrame(data, columns=['subjid','age', 'gender'])
dm

Unnamed: 0,subjid,age,gender
0,10010,20,M
1,10011,25,F
2,10012,23,F


## Send to clipboard

In [4]:
# This copies your dataframe in the clipboard. You can then paste the data in, say, Excel. This may not work over a server
dm.to_clipboard()

PyperclipException: 
    Pyperclip could not find a copy/paste mechanism for your system.
    For more information, please visit
    https://pyperclip.readthedocs.io/en/latest/#not-implemented-error
    

## Export in JSON

JSON (JavaScript Object Notation) is a text format that has become quite popular. Originally used for exchanging data between a browser and a server, we can use it for saving data (e.g. Python dictionnaries or pandas dataframes). It's language agnostic and supported by pandas. 
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html

In [22]:
# Save a dataframe to JSON
dm.to_json('exported_dm.json')

# and then to read it back:
# df = pd.read_json('exported_dm.json')

## Export in SAS formats 

We cannot save to SAS7BDAT as it is a closed format. However we can save to XPORT/XPT using a third party module.

https://pypi.org/project/xport/

In [24]:
# Code coming from Xport documentation

ds = xport.Dataset(dm, name='DATA', label='demographics')

for k, v in ds.items():
    v.label = k               # Use the column name as SAS label
    v.name = k.upper()[:8]    # SAS names are limited to 8 chars
    if v.dtype == 'object':
        v.format = '$CHAR20.' # Variables will parse SAS formats
    else:
        v.format = '10.2'

library = xport.Library({'DATA': ds})

with open('dm.xpt', 'wb') as f:
    xport.v56.dump(library, f)

Converting column 'subjid' from int64 to float
Converting column 'age' from int64 to float
Converting column 'gender' from object to string


## Export to CSV

Pandas supports export to CSV. Many options to control the outcome: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html

In [3]:
dm.to_csv('exported_dm.csv')

# and then to read it back:
# df = pd.read_csv('exported_dm.csv')

## Export to Excel formats

Same with XLSX. Doc: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html

In [8]:
dm.to_excel('exported_dm.xlsx', index=False)

# and then to read it back:
# df = pd.read_excel('exported_dm.xlsx')

## Export to Pickle

Pickle is THE common way to serialize Python objects in binary format. It is supported by Pandas
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_pickle.html

Disadvantage: it doesn't garantee cross-language or even cross-version compatibility. Can be a security issue if you unpickle unknown sources.

In [10]:
dm.to_pickle('exported_dm.pickle')

# and then to read it back
#df = pd.read_pickle('exported_dm.pickle')


Unnamed: 0,subjid,age,gender
0,10010,20,M
1,10011,25,F
2,10012,23,F


## Export to HDF5

HDF5 allows to store large amounts of data efficiently. Potentially a good candidate for a department-wide solution

In [15]:
dm.to_hdf('exported_dm.hdf', key='dm', mode='w')

# and then to read it back
# pd.read_hdf('exported_dm.hdf')

Unnamed: 0,subjid,age,gender
0,10010,20,M
1,10011,25,F
2,10012,23,F


## Export in feather

feather is a light-weight binary format that seems to be a very good candidate to persist dataframes between sessions. It's supported by pandas.

In [25]:
# Save a dataframe to feather:
dm.to_feather('exported_dm.feather')

# and then to read it back
# pd.read_feather('exported_dm.feather')

__________________________________________________
Nicolas Dupuis, Methodology and Innovation (IDAR C&SP), 2020+