# How to load a csv file?

This notebook shows how to load a [csv file](https://en.wikipedia.org/wiki/Comma-separated_values) as a [pandas.DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).

Data is stored in [figshare](https://figshare.com/articles/dataset/pregnancy-data/28339535) (check out [How to download data from figshare?](./load_figshare_data.ipynb) for more details on how to download it).

## Loading data

In [1]:
import polpo.preprocessing.pd as ppd
from polpo.preprocessing.load import FigsharePregnancyDataLoader

In [2]:
loader = FigsharePregnancyDataLoader(
    data_dir="~/.herbrain/data/pregnancy",
    remote_path="28Baby_Hormones.csv",
    use_cache=True,
)

data = (loader + ppd.CsvReader())()

data

Unnamed: 0,sessionID,estro,prog,lh,gestWeek,stage,EndoStatus,trimester
0,ses-01,,,,-3.0,pre,pilot1,pre
1,ses-02,3.42,0.84,,-0.5,pre,pilot2,pre
2,ses-03,386.0,,,1.0,pre,IVF,pre
3,ses-04,1238.0,,,1.5,pre,IVF,pre
4,ses-05,1350.0,2.94,,2.0,pre,IVF,first
5,ses-06,241.0,8.76,,3.0,preg,Pregnant,first
6,ses-07,,,,9.0,preg,Pregnant,first
7,ses-08,,,,12.0,preg,Pregnant,first
8,ses-09,,,,14.0,preg,Pregnant,second
9,ses-10,4700.0,53.9,1.45,15.0,preg,Pregnant,second


## Manipulate data

NB: most operations are done in place. Exploit `Df.Copy` if want to avoid it or `inplace` parameter if it exists.

In [3]:
cleaning_pipe = (
    ppd.DfCopy()
    + ppd.UpdateColumnValues("sessionID", lambda entry: int(entry.split("-")[1]))
    + ppd.IndexSetter("sessionID", drop=True, inplace=True)
    + ppd.Drop(27, inplace=True)
    + ppd.Dropna(inplace=True)
)

cleaning_pipe(data)

Unnamed: 0_level_0,estro,prog,lh,gestWeek,stage,EndoStatus,trimester
sessionID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
10,4700.0,53.9,1.45,15.0,preg,Pregnant,second
11,4100.0,56.8,0.87,17.0,preg,Pregnant,second
12,6190.0,70.6,0.93,19.0,preg,Pregnant,second
13,9640.0,54.7,0.62,22.0,preg,Pregnant,second
14,8800.0,64.1,0.73,24.0,preg,Pregnant,second
15,8970.0,61.4,0.73,27.0,preg,Pregnant,third
16,10200.0,74.2,0.69,29.0,preg,Pregnant,third
17,9920.0,83.0,0.77,31.0,preg,Pregnant,third
18,9860.0,95.3,0.83,33.0,preg,Pregnant,third
19,12400.0,103.0,0.59,36.0,preg,Pregnant,third
