# Pandas Tutorial
https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html

## How do I read and write tabular data?
https://pandas.pydata.org/docs/getting_started/intro_tutorials/02_read_write.html

To start using pandas the convention is to importing `as pd`.

In [1]:
import pandas as pd

![alt text](003-data.png)

I want to analyze the Titanic passenger data, available as a CSV file.

In [2]:
titanic = pd.read_csv('titanic.csv')
print(titanic)

     PassengerId  Survived  Pclass  \
0              1         0       3   
1              2         1       1   
2              3         1       3   
3              4         1       1   
4              5         0       3   
..           ...       ...     ...   
886          887         0       2   
887          888         1       1   
888          889         0       3   
889          890         1       1   
890          891         0       3   

                                                  Name     Sex   Age  SibSp  \
0                              Braund, Mr. Owen Harris    male  22.0      1   
1    Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                                Heikkinen, Miss Laina  female  26.0      0   
3         Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                             Allen, Mr. William Henry    male  35.0      0   
..                                                 ...     ...   ... 

Pandas provides the `read_csv()` function to read data stored as a csv file into a pandas `DataFrame`.

Pandas supports many different file formats or data sources out of the box (csv, excel, sql, json,...), each of them with the prefix `read_*()`

Make sure to always have a check on the data after reading  in the data. As seen before, when displaying a `DataFrame`, the first and last five rows will be shown by default.

If I want to see the first 8 rows, when can use the `.head()` method, with the required number of rows. Likewise, if I want to see the last rows we can use the `.tail()` method, in a similar manner.

In [3]:
print(titanic.head(8))
print(titanic.tail(8))

   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   
5            6         0       3   
6            7         0       1   
7            8         0       3   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                              Heikkinen, Miss Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   
5                                   Moran, Mr. James    male   NaN      0   
6                            McCarthy, Mr. Timothy J    male  54.0      0   
7                      Palsson, Master Gosta Leonard    mal

Let's check how pandas interpreted each of the column data types using `.dtypes`

When asking the `dtypes` no brackets are used as this is an attribute of a `DataFrame` and `Series`. Attributes represent a characteristic of a `DataFrame` or `Series`, whereas methods (which require brackets) *do* something.


In [4]:
print(titanic.dtypes)

PassengerId      int64
Survived         int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
dtype: object


You can export data to other formats. Whereas `read_*` functions are used to read data to pandas, the `to_*` methods are used to store data. In this case we are going to use the `.to_excel()` method to store the data as an excel file.

`sheet_name` allows us to specify the name of the sheet instead of the standard `Sheet1`. Also, by setting the `index=False` the row index labels are not saved in the spreadsheet.

In [5]:
titanic.to_excel("titanic.xlsx", sheet_name="passengers", index=False)


We can now read the excel file with `read_excel() and output the first elements.

In [6]:
titanic = pd.read_excel("titanic.xlsx", sheet_name="passengers")
print(titanic.head())

   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                              Heikkinen, Miss Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   

   Parch            Ticket     Fare Cabin Embarked  
0      0         A/5 21171   7.2500   NaN        S  
1      0          PC 17599  71.2833   C85        C  
2      0  STON/O2. 3101282   7.9250   NaN        S  
3      0            113803  53.1000  C123        S  
4      0            373450   8.0500   NaN        S  


Finally we can request a technical summary of a `DataFrame` using the `.info()` method.

In [7]:
print(titanic.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB
None


## Summary
Getting data in to pandas from many different file formats or data sources is supported by `read_*` functions.

Exporting data out of pandas is provided by different `to_*` methods.

The `head`/`tail`/`info` methods and the `dtypes` attribute are convenient for a first check.