# Pandas
- Pandas is a fast, powerful, flexible and an easy-to-use open-source data analysis tool built on the top of the Python and NumPy
- It provides 2 data structures primarily
**1. Series:**
- Series is a 1-D array, which is labeled along with its index.
- Unlike the traditional array and list, it is pretty-printed as the cells of the tables
- The key difference between a series and an array or a list is that the indexing data-type of the series can be modified by the user, while this is not true for lists and arrays
- Notably, not only out of list, but Series can also be created by passing the Python's built-in dictionary.
- In case of a dictionary being the Series, the keys are taken as index
**2. DataFrame:**
- DataFrame is another data structure provided by Pandas.
- It is a 2-D representation of the labeled data, from table
# Features of Pandas
- Pandas easily handles the missing values, showing `NA` or `NaN` for the values which are not defined
- It provides powerful functions to classify the data in groups and also aggregate it
- Pandas supports reading and writing the data from various formats which includes:
    1. CSV
    2. JSON
    3. Excel
    4. SQL
    5. And many more...
- Pandas has built-in support for time-series
- Pandas, when combined with NumPy and MatPlotLib, can serve the purpose of data manipulation and data analysis very well
- Let's get started with coding Pandas and its topics:
# Series
- Series can be created either by passing list, tuple and set or dictionary
## List, tuple and set as series
- If a series is created using list, tuple or set, the indexing is automatically assigned as normally.
- Indexing values can be cutomized.
## Dictionary as series
- In the case of a dictionary as a series, the keys of the dictionary are taken in use as the indexes.
- Indexing can be customized and if attempted to customize, it will show the `NaN` value because the index values passed through have no value.

In [15]:
# import Series as series from "pandas"
from pandas import Series as series

# Series from a list
a = series([1, True, "Hello", 3.14])
print(f"Series from a list:\n{a}\n")

# Series from a dictionary
b = series({
    "name": "Chiku",
    "role": "Batsman",
    "totalRuns": 27599
})
print(f"Series from a dictionary:\n{b}\n")

# Series from a dictionary, attempting to tweak index
c = series({
    "name": "Vadapav",
    "role": "Batsman",
    "totalRuns": 19700 
}, index = [1, 2, 3])
print(f"Series with tweaked indexes:\n{c}")

Series from a list:
0        1
1     True
2    Hello
3     3.14
dtype: object

Series from a dictionary:
name           Chiku
role         Batsman
totalRuns      27599
dtype: object

Series with tweaked indexes:
1    NaN
2    NaN
3    NaN
dtype: object


## Accessing index and values of the series
1. **Values**
- Values of a series can be accessed by using the attribute `value` of the series 
2. **Index**
- Index of the series can be accessed by using the attribute `index` of the series 

In [16]:
from pandas import Series as series
a = series([1, "Hi", False, 3.33])
print(f"Generated series:\n{a}\n")

# Accessing values
b = a.values
print(f"Values of the series are:\n{b}\n")

# Accessing indexes
c = a.index
print(f"Indexes of the series are:\n{c}")

Generated series:
0        1
1       Hi
2    False
3     3.33
dtype: object

Values of the series are:
[1 'Hi' False 3.33]

Indexes of the series are:
RangeIndex(start=0, stop=4, step=1)


## Customizing index of Series
- Customizing the index of the series is a feature, which in my sense makes a series different than an array or list.
- You can pass the `index` arguement while creating the series, if you want to customize the indexing of the series
- If it is not passed that argue, then the series will nbe indexed normally, like that in for a list or an array 

In [17]:
from pandas import Series as series
a = series([1, "Hello", False, 3.33], index = ["a", "b", "c", "d"])
print(f"Example of customizing series index:\n{a}\n")

Example of customizing series index:
a        1
b    Hello
c    False
d     3.33
dtype: object



- Here, if you passes the index elements more or less in number than the length of the series, then, it throws the error.
- For example, look below:

In [18]:
from pandas import Series as series
a = series([1, "Hello", False, 3.33], index = ["a", "b", "c", "d", "e"])
print(f"Example of customizing series index:\n{a}\n")

ValueError: Length of values (4) does not match length of index (5)

# DataFrame
- DataFrame is a 2D mutable and heterogenous tabular-form data structure with the labeled axes
- Common way to generate the DataFrame is by using the dictionary, with values of each keys being a list of same length and greater than one
- For example:

In [None]:
from pandas import DataFrame as df
obj = {
    "name": ["R. Sharma", "S. Dhawan", "V. Kohli"],
    "role": ["Batsman", "Batsman", "Batsman"],
    "team": ["MI", "PBKS", "RCB"]
}
table = df(obj)
print(table)

- And as you can see, how is it pretty-printed
## Customizing order of columns
- Like index of series can be customized, the order of the columns in the dataframe can also be customized
- The keyword arguement `columns` is used to customize the order of the columns in the dataframe
- It is the list of keys of the object, but in the order, you want to view it.
- It can be optionally passed when creating the dataframe.
- Here is the example:

In [None]:
from pandas import DataFrame as df
obj = {
    "name": ["R. Sharma", "S. Dhawan", "V. Kohli"],
    "role": ["Batsman", "Batsman", "Batsman"],
    "team": ["MI", "PBKS", "RCB"]
}
table = df(obj, columns = ["team", "name", "role"])
print(table)

- Here, in columns list, if you passes the key's name which is not there available in the object sent to DataFrame for creating the table, then you will see all the values of the column with that key shown in the table as `NaN`, again.
- Look at example below

In [None]:
from pandas import DataFrame as df
obj = {
    "name": ["R. Sharma", "S. Dhawan", "V. Kohli"],
    "role": ["Batsman", "Batsman", "Batsman"],
    "team": ["MI", "PBKS", "RCB"]
}
table = df(obj, columns = ["name", "role", "team", "totalRuns"])
print(table)

## Customizing index of DataFrame
- Like Series, the indexing values of the DataFrame can also be customized, in the way similar to that how the indexing of the Series was customizable.
- Example:

In [None]:
from pandas import DataFrame as df
obj = {
    "name": ["R. Sharma", "S. Dhawan", "V. Kohli"],
    "role": ["Batsman", "Batsman", "Batsman"],
    "team": ["MI", "PBKS", "RCB"]
}
table = df(obj, index = [1, 2, 3])
print(table)

## Accessing columns
- Like dictionary, the columns in a dataframe can be accessed by the dictionary-like syntax
- The syntax followed for accessing columns seperately is:
1. `dict.key`
- This attribute-like value-accesing syntax can only work, if the particular key has no space.
- It fails if there is space in the name of the key
2. `dict[key]`
- This method is helpful for accessing any type of keys, irrespective of whether the key has or not the space in its name, this method will get you the values of the keys retrieved successfully if the key exists

In [None]:
from pandas import DataFrame as df
obj = {
    "name": ["R. Sharma", "S. Dhawan", "V. Kohli"],
    "role": ["Batsman", "Batsman", "Batsman"],
    "team": ["MI", "PBKS", "RCB"]
}
table = df(obj, index = [1, 2, 3])
col = table.name
print(f"Names of the players are:\n{col}\n")

## Accessing column's names
- Accessing the names of all the existing colummns in a DataFrame is possible via the `columns` attribute of the dataframe
- For example

In [None]:
from pandas import DataFrame as df
obj = {
    "name": ["R. Sharma", "S. Dhawan", "V. Kohli"],
    "role": ["Batsman", "Batsman", "Batsman"],
    "team": ["MI", "PBKS", "RCB"]
}
table = df(obj, index = [1, 2, 3])
head = table.columns
print(f"Column names of the table are:\n{head}")

## Accessing rows
- The columns of the dataframe can also be accessed using the `loc[]` attribute of the dataframes.

In [None]:
from pandas import DataFrame as df
obj = {
    "name": ["R. Sharma", "S. Dhawan", "V. Kohli"],
    "role": ["Batsman", "Batsman", "Batsman"],
    "team": ["MI", "PBKS", "RCB"]
}
table = df(obj, index = [1, 2, 3])
row = table.loc[3]
print(f"Information of king:\n{row}")

## Creating new columns
- The new columns can be created easily by the same ditionary-like syntax, in any existing DataFrame
- The value of it can either be single or an array
- If the value passed is array with length not matching the length of the lists, it will again throw the error
- If the value passed is single, then it is automatically expanded.
- For example, see below ot get more clear idea of what I am trying to convey:

In [None]:
from pandas import DataFrame as df
obj = {
    "name": ["R. Sharma", "S. Dhawan", "V. Kohli"],
    "role": ["Batsman", "Batsman", "Batsman"],
    "team": ["MI", "PBKS", "RCB"]
}
table = df(obj, index = [1, 2, 3])

# 1 Value as value of column
table["runs"] = "10000+"
print(f"Table showing common total runs between 3 batters:\n{table}\n")

# array as value of column
table["runs"] = [19700, 10867, 27599]
print(f"Table showing the specific runs of each batter:\n{table}")

## Deleting columns
- The columns existing in the table can be deleted by the help of the keyword `del`, like similar to that of the dictionary

In [None]:
from pandas import DataFrame as df
obj = {
    "name": ["R. Sharma", "S. Dhawan", "V. Kohli"],
    "role": ["Batsman", "Batsman", "Batsman"],
    "team": ["MI", "PBKS", "RCB"]
}
table = df(obj, index = [1, 2, 3])

# Before deleting a column
print(f"Table before deleting a column:\n{table}\n")

# Deleting the "team"'s column
del table["team"]

# After deleting the team's column, table
print(f"Table after deleting the 'team' column:\n{table}")

## Transposing DataFrame
- Like NumPy's arrays, transposing can also be done on the DataFrame of Pandas
- The same attribute `T` is used for the purpose
- For an example:

In [19]:
from pandas import DataFrame as df
obj = {
    "name": ["R. Sharma", "S. Dhawan", "V. Kohli"],
    "role": ["Batsman", "Batsman", "Batsman"],
    "team": ["MI", "PBKS", "RCB"]
}
table = df(obj, index = [1, 2, 3])

# Normal table
print(f"Normal table is:\n{table}\n")

# Transposing the table
tp = table.T
print(f"Transposing the table gives:\n{tp}")

Normal table is:
        name     role  team
1  R. Sharma  Batsman    MI
2  S. Dhawan  Batsman  PBKS
3   V. Kohli  Batsman   RCB

Transposing the table gives:
              1          2         3
name  R. Sharma  S. Dhawan  V. Kohli
role    Batsman    Batsman   Batsman
team         MI       PBKS       RCB


## Re-indexing
- In Pandas, the objects can be reindexed by using the `reindex` attribute either on Series or DataFrame
- Reindexing will change the indexes of the existing DataFrame, hence, if there is nothing pre-existing on that index, it will show the `NaN` there in the new formed object.
- A code example is given below:

In [20]:
from pandas import DataFrame as df
obj = {
    "name": ["R. Sharma", "S. Dhawan", "V. Kohli"],
    "role": ["Batsman", "Batsman", "Batsman"],
    "team": ["MI", "PBKS", "RCB"]
}
table = df(obj, index = [1, 2, 3])

# Before reindexing
print(f"Before reindexing:\n{table}\n")

# Reindexing
rI = table.reindex(["a", "b", "c"])

# After reindexing
print(f"After reindexing:\n{rI}")

Before reindexing:
        name     role  team
1  R. Sharma  Batsman    MI
2  S. Dhawan  Batsman  PBKS
3   V. Kohli  Batsman   RCB

After reindexing:
  name role team
a  NaN  NaN  NaN
b  NaN  NaN  NaN
c  NaN  NaN  NaN
