# Read & Write (Tabular) Data with Pandas

In [1]:
import pandas as pd

ModuleNotFoundError: No module named 'pandas'

![image.png](attachment:65b0e05b-f182-4961-8f34-22e7cf91f97b.png)

*I want to analyze the Titanic passenger data, available as a CSV file.*

pandas provides the `read_csv()` function to read data stored as a csv file into a pandas DataFrame. pandas supports many different file formats or data sources out of the box (csv, excel, sql, json, parquet, …), each of them with the prefix **read_*.**

- Make sure to always have a check on the data after reading in the data. When displaying a DataFrame, the first and last 5 rows will be shown by default.

- 

In [None]:
titanic = pd.read_csv("../../data/titanic.csv")

I want to **see the first 8 rows** of a pandas DataFrame.



In [None]:
titanic.head(8)

To see the first N rows of a DF, use the `head()` method. 
We can specify the number of rows from the top to display (excluding) the column names, by including the desired number as argument, in this case 8. By default the head() method will show the first 5 rows. 


Interested in the last N rows instead? pandas also provides a `tail()` method. For example, `titanic.tail(10)` will return *the last 10 rows* of the DataFrame.

We can check how pandas has interpreted the column data types by requesting the pandas `dtypes` attribute:


In [None]:
titanic.dtypes

For each column, the data type is listed.
- here, we have `integers`, `floats` and `strings`. 

NOTE: When asking for the `dtypes`, no brackets are used! dtypes is an attribute of a DataFrame and Series. `Attributes` of a DataFrame or Series do not need brackets. **Attributes represent a characteristic** of a DataFrame/Series, whereas **methods (which require brackets) do something with the DataFrame/Series** as introduced in the first tutorial.

Your colleague has requested the tiatnic data as a spreadsheet... no problem:

In [None]:
titanic.to_excel("../data/titanic.xlsx", sheet_name="passengers", index=False)

`read_*` methods are used to **read** data to pandas
`to_*` mthods are used to **store** data. 

Here, the if sheet_name wasn't specified, the sheet_name would default to `sheet_1`

`index=False` saved teh spreadhseet without numbered rows (without row index labels)

The equivalent read function read_excel() will reload the data to a DataFrame:



In [None]:
titanic = pd.read_excel("../data/titanic.xlsx", sheet_name="passengers")

In [None]:
titanic.tail(19)

Now, we're interested in a technical summary of our DataFrame...

In [None]:
titanic.info()

The `info` method provides technical info about a DataFrame:
- It is indeed a DataFrame.

- There are `891 entries`, i.e. `891 rows`.

- Each row has a row label (aka the index) with values ranging from 0 to 890.

- The table has 12 columns. Most columns have a value for each of the rows (all 891 values are non-null). Some columns do have missing values and less than 891 non-null values.

- The columns Name, Sex, Cabin and Embarked consists of textual data (strings, aka object). The other columns are numerical data with some of them whole numbers (aka integer) and others are real numbers (aka float).

- The kind of data (characters, integers,…) in the different columns are summarized by listing the dtypes.

- The approximate amount of RAM used to hold the DataFrame is provided as well.

**REMEMBER!**
- Getting data in to pandas from many different file formats or data sources is supported by read_* functions.

- Exporting data out of pandas is provided by different to_*methods.

- The head/tail/info methods and the dtypes attribute are convenient for a first check.