# Lecture 3: Pandas [Input/Output (I/O)](https://pandas.pydata.org/pandas-docs/stable/reference/io.html)

* CSV
    * [CSV Input](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)
    * [CSV Output](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html)
* Excel
    * [Excel Input](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html)
    * [Excel Output](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html)

## Imports

In [1]:
import pandas as pd

## CSV

### [CSV Input](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)

Usually, we don't create `DataFrame`s ourselves, but read them from external sources, such as a CSV file or even a database. (Databases are beyond the scope of this lecture, but you can read more [here](https://pandas.pydata.org/pandas-docs/stable/reference/io.html#sql) if you're interested.)

To read CSV files, we use `pd.read_csv(...)`:

In [2]:
df = pd.read_csv('students.csv')
df

Unnamed: 0,student,programme,enrolment
0,Bob,BIM,2008.0
1,Jake,MiM,2012.0
2,Lisa,IM,2004.0
3,Sue,SCM,


We see that the column names are imported, too. The data types look correct, too:

In [3]:
df.dtypes

student       object
programme     object
enrolment    float64
dtype: object

Note that we used `Series.dtype` to check a `Series`' data type. Since `DataFrame`s usually contain more than one data type, the `DataFrame` equivalent is `df.dtypes`.

`pd.read_csv(...)` offers many options to customise its behaviour to specific (or unusual) CSV files. For instance, if the CSV file did not contain a header, we would set `header=None` (of course, in this case, it makes little sense):

In [4]:
pd.read_csv('students.csv', header=None)

Unnamed: 0,0,1,2
0,student,programme,enrolment
1,Bob,BIM,2008
2,Jake,MiM,2012
3,Lisa,IM,2004
4,Sue,SCM,


If you work with Dutch number formatting (which is strongly discouraged), you may need to set `thousands` and `decimal`. Microsoft Excel is also known for peculiar CSV formats, so it's often better to import Excel files directly rather than converting them to CSVs.

There are more option than we have time to go through. Generally speaking, if a CSV does not look like the below [standard](https://tools.ietf.org/html/rfc4180), you may need to make adjustments to its parsing.

```
student,programme,enrolment
Bob,BIM,2008
Jake,MiM,2012
Lisa,IM,2004
Sue,SCM,NULL
```

#### HackerRank

Reading files on HackerRank works the same way. You can assume that a file is in the same directory as the code, so you can use `pd.read_csv('filename.csv')`.

### [CSV Output](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html)

To write a `DataFrame` to a CSV, we use `DataFrame.to_csv(filename)`. There are again many options, but the default output format follows the RFC standard.

## Excel

### [Excel Input](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html)

To read an Excel sheet into a `DataFrame`, we use `pd.read_excel(filename)`. By default, Pandas will read the first sheet in the workbook, but you can also specify the `sheet_name`.

### [Excel Output](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html)

To write a `DataFrame` to an Excel file, we use `DataFrame.to_excel(filename)`. The option that you will probably use most often is `sheet_name`, which allows you to specify to which sheet in the Excel workbook the `DataFrame` should be written.

© 2023 Philipp Cornelius