# Installing (or upgrading) Pandas

This command installs the pandas library using pip, Python's package installer.\
If you haven't installed pandas before, this will download and install the latest version of pandas along with its dependencies.

`pip install pandas`

Adding the `--upgrade` flag not only installs pandas if it isn't already installed but also ensures that if pandas is already installed, it is updated to the latest version. This is useful to make sure you have the newest features and bug fixes.

`pip install --upgrade pandas`

### `ModuleNotFoundError` and Pandas dependencies

When using Pandas, you may encounter the error message `ModuleNotFoundError`. This is caused by so-called Pandas dependencies.

When you install the pandas library for Python, it requires other libraries to function correctly. These required libraries are called dependencies.
Dependencies are external libraries that provide additional functionality or capabilities that the main library (in this case, Pandas) relies on to operate.

When you install Pandas using `pip`, Python's package installer, it automatically installs any required dependencies. However, optional dependencies are not installed by default and must be installed separately if needed.

[You can read more about Pandas dependencies here.](https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html#optional-dependencies)

## Import Pandas

Aliasing pandas as `pd` is a widely adopted convention that simplifies the syntax for accessing its functionalities.\
After this statement, you can use `pd` to access all the functionalities provided by the pandas library.

In [22]:
# This line imports the pandas library and aliases it as 'pd'.

import pandas as pd

## Representation of a Pandas `DataFrame`

![Representation of a Pandas DataFrame](images/01_table_dataframe.svg)

## Creating our first `DataFrame`

We start by creating three lists of equal length (i.e., containing the same amount of elements).\
These lists will be used as columns for a `DataFrame`, with each list representing a column and each element within the list representing a row in that column.

In [25]:
# Create three lists named 'name', 'age', and 'sex'.

name = ["Braund", "Allen", "Bonnel"]
age = [22, 35, 58]
sex = ["male", "male", "female"]

We use lists `name`, `age`, and `sex` to fill in the columns.\
Each list corresponds to a column in the DataFrame.\
`Name`, `Age`, and `Sex` are the titles of these columns.

In [26]:
# Create a DataFrame named 'df' based on three lists.

df = pd.DataFrame({'Name': name, 'Age': age, 'Sex': sex})

### Creating a `DataFrame` in pandas is similar to creating a `dictionary`.

The `key`s in the `dictionary` become the column names, while the `value`s, which are `list`s or `array`s, form the columns' data. \
For more information on `dictionary`s in Python, see: \
[https://www.geeksforgeeks.org/python-dictionary/](https://www.geeksforgeeks.org/python-dictionary/)

In [4]:
# Display the DataFrame 'df'.

df

Unnamed: 0,Name,Age,Sex
0,Braund,22,male
1,Allen,35,male
2,Bonnel,58,female


In a spreadsheet software, the table representation of our data would look very similar

![Table representaion in spreadsheet software](images/01_table_spreadsheet.png)

In [27]:
# Check the type of the 'df' object using the 'type()' function.

type(df)

pandas.core.frame.DataFrame

## Attributes

We can use the `shape` attribute to determine the dimensions of the `DataFrame` 'df'.\
It returns a tuple representing the number of rows and columns (rows, columns).

In [28]:
df.shape

(3, 3)

And we can use the `dtypes` attribute to view the data types of each column in the 'df' `DataFrame`.\
This command provides information about the data type of each column, such as integer, float, or object (string).

In [29]:
df.dtypes

Name    object
Age      int64
Sex     object
dtype: object

When asking for the `shape` or `dtypes`, no parentheses `()` are used. Both are an attribute of `DataFrame` and `Series`. (`Series` will be explained later.)

Attributes of a `DataFrame` or `Series` do not need `()`.

Attributes represent a characteristic of a `DataFrame`/`Series`, whereas methods (which require parentheses `()`) do something with the `DataFrame`/`Series`. 

### Transposing a `DataFrame`

The `transpose` method swaps the `DataFrame`'s rows and columns, creating 'df_transposed'.\
Transposing is useful for reshaping data, making it easier to compare rows or apply certain operations that are typically column-based.


In [30]:
# Transpose the DataFrame 'df' using the 'transpose()' method.

df_transposed = df.transpose()

In [9]:
# Display the DataFrame 'df_transposed'.

df_transposed

Unnamed: 0,0,1,2
Name,Braund,Allen,Bonnel
Age,22,35,58
Sex,male,male,female


### Renaming columns

We can rename the columns of our `DataFrame` after creation.\
This is done by assigning a new list of column names to `df.columns`.\
The new column names are `Names`, `Age`, and `Sex`, in that order.

In [31]:
# Rename the columns of the DataFrame 'df'.

df.columns = ['Names', 'Age', 'Sex']

This method is useful for selectively renaming only one or more columns without changing the entire set of column names:

In [32]:
# Rename the 'Age' column to 'Ages' in the DataFrame 'df'.

df = df.rename(columns={'Age': 'Ages'})

In [12]:
# Our DataFrame now looks like this:

df

Unnamed: 0,Names,Ages,Sex
0,Braund,22,male
1,Allen,35,male
2,Bonnel,58,female


## Each column in a `DataFrame` is a `Series`

![DataFrame Series](images/01_table_series.svg)

When we access a column in a `DataFrame`, this actually returns a `Series` object containing all the data in that column.

In [34]:
df['Ages']

0    22
1    35
2    58
Name: Ages, dtype: int64

In [14]:
# Check the type of the 'Ages' column in 'df' using the 'type()' function.

type(df['Ages'])

pandas.core.series.Series

### We can also create our own `Series`

We can create and name a `Series` in the following way.\
The `name` parameter assigns the name 'Fare' to the Series.

In [33]:
# Create a pandas Series named 'fare' with specified values.

fare = pd.Series([7.2500, 71.2833, 7.9250], name='Fare')

In [16]:
# Display the 'fare' Series.
# This outputs the values along with their index positions and the name of the Series.

fare

0     7.2500
1    71.2833
2     7.9250
Name: Fare, dtype: float64

In [17]:
# Check the data type of 'fare' using the 'type()' function.

type(fare)

pandas.core.series.Series

## Appending a `Series` to an existing `DataFrame`

We can add a `Series` as a new column to a `DataFrame`, extending it horizontally.\
Here, the name of the 'fare' `Series` ('Fare') becomes the column name in the updated `DataFrame`.

In [35]:
# Concatenate the 'fare' Series to the 'df' DataFrame along the columns (axis=1).

df = pd.concat([df, fare], axis=1)

In [19]:
# Display the updated DataFrame 'df'.

df

Unnamed: 0,Names,Ages,Sex,Fare
0,Braund,22,male,7.25
1,Allen,35,male,71.2833
2,Bonnel,58,female,7.925


### Create a new column based on data in an existing column

We can also create a new column based on the data in an existing column.

Here, we create a new column 'Age_in_3_years' in the `DataFrame` 'df'.\
This column is calculated by adding 3 to each value in the 'Ages' column.

In [36]:
df['Age_in_3_years'] = df['Ages'] + 3

In [21]:
# Display the updated DataFrame 'df'.

df

Unnamed: 0,Names,Ages,Sex,Fare,Age_in_3_years
0,Braund,22,male,7.25,25
1,Allen,35,male,71.2833,38
2,Bonnel,58,female,7.925,61


## Exercise:
* Create a new column called 'Fare_in_DKK' based on the column 'Fare'.
* We assume the old fare prices to be in GBP and the exchange rate to be £1 = 8.7 DKK

<details>
  <summary>Click to reveal solution</summary>
  <br/>
    
`df['Fare_in_DKK'] = df['Fare'] * 8.7`

This solution creates a new column in the `DataFrame` named 'Fare_in_DKK', which contains the fare prices converted from GBP to DKK using the given exchange rate.<br/>
Each fare value in GBP is multiplied by the exchange rate to obtain the corresponding fare value in DKK.

R</details>

## REMEMBER

* Import the package, aka `import pandas as pd`.

* A table of data is stored as a pandas `DataFrame`.

* The `shape`and `dtypes` attributes are convenient for a first check.

* Each column in a `DataFrame` is a `Series`.

* We can append `Series` as columns to an existing `DataFrame`.