In [1]:
%pip install scipy pyreadr

Note: you may need to restart the kernel to use updated packages.


In [3]:
import pandas as pd
import scipy.io
import pyreadr

# Working with DataFrames in Mat Files (for Matlab) and .Rds Files (for R)

Saving data into a format that is specifically meant for another programming environment can sometimes be a little tricky.  Here are two starter demonstrations that show a Pandas DataFrame getting saved and loaded into Matlab- and R- focused Files.

## Working with .MAT Files (from Matlab)


|  Functions | Description |
| -- | -- |
| **`scipy.io.whosmat`** | See what variables are inside the .mat file.  |
| **`scipy.io.savemat`** | Save a .mat file. |
| **`scipy.io.loadmat`** | Load data afrom a .mat file. | 
| **`scipy.io.matlab.matfile_version`**  | Find out what version of the .mat file is saved (can affect what functions are needed) |

In [4]:
df = pd.read_csv('../workshops/data/MentalRotation.csv')
df.head()

Unnamed: 0,Subject,Trial,Angle,Matching,Response,Time,Correct,Age,Sex
0,49,1,0,0,n,3107,1,32,M
1,49,2,150,0,n,2930,1,32,M
2,49,3,150,1,b,1874,1,32,M
3,49,4,100,1,b,3793,1,32,M
4,49,5,50,1,b,2184,1,32,M


## Roundtrip tests: Saving and Loading Data to see what comes out

the `savemat()` function expects a dict-like object, and it expects to save Numpy-like arrays.  So, giving it a pandas dataframe kind of works, but it splits the columns into their own individual variables.  

In [5]:
scipy.io.savemat('mental_rot.mat', df)
scipy.io.loadmat('mental_rot.mat')

{'__header__': b'MATLAB 5.0 MAT-file Platform: nt, Created on: Tue Dec 12 15:15:51 2023',
 '__version__': '1.0',
 '__globals__': [],
 'Subject': array([[49, 49, 49, ..., 33, 33, 33]], dtype=int64),
 'Trial': array([[ 1,  2,  3, ..., 94, 95, 96]], dtype=int64),
 'Angle': array([[  0, 150, 150, ...,  50, 100,   0]], dtype=int64),
 'Matching': array([[0, 0, 1, ..., 0, 1, 0]], dtype=int64),
 'Response': array([[array(['n'], dtype='<U1'), array(['n'], dtype='<U1'),
         array(['b'], dtype='<U1'), ..., array(['n'], dtype='<U1'),
         array(['b'], dtype='<U1'), array(['n'], dtype='<U1')]],
       dtype=object),
 'Time': array([[3107, 2930, 1874, ..., 1226, 2783, 1017]], dtype=int64),
 'Correct': array([[1, 1, 1, ..., 1, 1, 1]], dtype=int64),
 'Age': array([[32, 32, 32, ..., 20, 20, 20]], dtype=int64),
 'Sex': array([[array(['M'], dtype='<U1'), array(['M'], dtype='<U1'),
         array(['M'], dtype='<U1'), ..., array(['F'], dtype='<U1'),
         array(['F'], dtype='<U1'), array(['F'],

So... how could one rebuild the DataFrame?

Load the Data to a dictionary...

In [55]:
data = scipy.io.loadmat('mental_rot.mat')
data.keys()

dict_keys(['__header__', '__version__', '__globals__', 'Subject', 'Trial', 'Angle', 'Matching', 'Response', 'Time', 'Correct', 'Age', 'Sex'])

...Strip out the private variables...

In [56]:
data2 = {key: value.flatten() for key, value in data.items() if not key.startswith('__')}
data2.keys()

dict_keys(['Subject', 'Trial', 'Angle', 'Matching', 'Response', 'Time', 'Correct', 'Age', 'Sex'])

...Pass it to the DataFrame constructor...

In [59]:
pd.DataFrame(data2).head()


Unnamed: 0,Subject,Trial,Angle,Matching,Response,Time,Correct,Age,Sex
0,49,1,0,0,[n],3107,1,32,[M]
1,49,2,150,0,[n],2930,1,32,[M]
2,49,3,150,1,[b],1874,1,32,[M]
3,49,4,100,1,[b],3793,1,32,[M]
4,49,5,50,1,[b],2184,1,32,[M]


...Wish you had just used another format instead! ;-)

## Working with Rds files with `pyreadr`

| Code | Description |
| :-- | :-- |
| **`pyreadr.write_rds(filename, var)`** | Write to an `.Rds` file |
| **`pyreadr.read_r(filename)`** | Read from an `.Rds` file |



`pyreadr`: 

  - supports R data frames, Tibbles, Vectors, Matrices, Arrays, and Tables.
  - doesn't support R lists and R S4 objects.

In [7]:
df.head()

Unnamed: 0,Subject,Trial,Angle,Matching,Response,Time,Correct,Age,Sex
0,49,1,0,0,n,3107,1,32,M
1,49,2,150,0,n,2930,1,32,M
2,49,3,150,1,b,1874,1,32,M
3,49,4,100,1,b,3793,1,32,M
4,49,5,50,1,b,2184,1,32,M


Writing and reading works pretty well!

In [9]:
pyreadr.write_rds("test.Rds", df)
rdf = pyreadr.read_r("test.Rds")
rdf[None].head()

Unnamed: 0_level_0,Subject,Trial,Angle,Matching,Response,Time,Correct,Age,Sex
rownames,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,49.0,1.0,0.0,0.0,n,3107.0,1.0,32.0,M
2,49.0,2.0,150.0,0.0,n,2930.0,1.0,32.0,M
3,49.0,3.0,150.0,1.0,b,1874.0,1.0,32.0,M
4,49.0,4.0,100.0,1.0,b,3793.0,1.0,32.0,M
5,49.0,5.0,50.0,1.0,b,2184.0,1.0,32.0,M
