## Python Libraries - Pandas - Describing Data

You will be working with files of varied shape and sizes in the Pandas. One you have loaded the data in the dataframes, it is necessary that you check the created dataframe. However, it would be inefficieent to print the entire dataframe every time. Hence, you should learn how to print limited number of rows in a dataframe.

In [1]:
# import the required libraries
import pandas as pd
import numpy as np

In [2]:
# Read data from the file 'sales.xlsx'
sales = pd.read_excel('./files/sales.xlsx')

# Check the created dataframe
sales

Unnamed: 0,Market,Region,No_of_Orders,Profit,Sales
0,Africa,Western Africa,251,-12901.51,78476.06
1,Africa,Southern Africa,85,11768.58,51319.5
2,Africa,North Africa,182,21643.08,86698.89
3,Africa,Eastern Africa,110,8013.04,44182.6
4,Africa,Central Africa,103,15606.3,61689.99
5,Asia Pacific,Western Asia,382,-16766.9,124312.24
6,Asia Pacific,Southern Asia,469,67998.76,351806.6
7,Asia Pacific,Southeastern Asia,533,20948.84,329751.38
8,Asia Pacific,Oceania,646,54734.02,408002.98
9,Asia Pacific,Eastern Asia,414,72805.1,315390.77


In [3]:
# Read the file with 'Region' as the index column
sales = pd.read_excel('./files/sales.xlsx',index_col=1)

# Check the created dataframe
sales

Unnamed: 0_level_0,Market,No_of_Orders,Profit,Sales
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Western Africa,Africa,251,-12901.51,78476.06
Southern Africa,Africa,85,11768.58,51319.5
North Africa,Africa,182,21643.08,86698.89
Eastern Africa,Africa,110,8013.04,44182.6
Central Africa,Africa,103,15606.3,61689.99
Western Asia,Asia Pacific,382,-16766.9,124312.24
Southern Asia,Asia Pacific,469,67998.76,351806.6
Southeastern Asia,Asia Pacific,533,20948.84,329751.38
Oceania,Asia Pacific,646,54734.02,408002.98
Eastern Asia,Asia Pacific,414,72805.1,315390.77


In [4]:
# Printing first 5 entries from a dataframe
sales.head()

Unnamed: 0_level_0,Market,No_of_Orders,Profit,Sales
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Western Africa,Africa,251,-12901.51,78476.06
Southern Africa,Africa,85,11768.58,51319.5
North Africa,Africa,182,21643.08,86698.89
Eastern Africa,Africa,110,8013.04,44182.6
Central Africa,Africa,103,15606.3,61689.99


In [5]:
# Printing first 8 entries of a dataframe
sales.head(8)

Unnamed: 0_level_0,Market,No_of_Orders,Profit,Sales
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Western Africa,Africa,251,-12901.51,78476.06
Southern Africa,Africa,85,11768.58,51319.5
North Africa,Africa,182,21643.08,86698.89
Eastern Africa,Africa,110,8013.04,44182.6
Central Africa,Africa,103,15606.3,61689.99
Western Asia,Asia Pacific,382,-16766.9,124312.24
Southern Asia,Asia Pacific,469,67998.76,351806.6
Southeastern Asia,Asia Pacific,533,20948.84,329751.38


In [None]:
# Printing last 5 entries of the dataframe
sales.ta

In [None]:
# Printing last 3 entries of the dataframe


#### Summarising the dataframes

A dataframe can have multiple columns and it is very important to understand what each column stores. You must be familiar with the column names, the data it stores, data type of each column, etc. Let's see different commands that will help you to do that.

In [None]:
# Summarising the dataframe structure


In [None]:
# Summary of data stored in each column


In [None]:
# Graphically summarising the spread of the columns - Profit and Sales


## Python Libraries - Pandas - Indexing and Slicing

In this section, you will:

* Select rows from a dataframe
* Select columns from a dataframe
* Select subsets of dataframes

### Selecting Rows

Selecting rows in dataframes is similar to the indexing you have seen in numpy arrays. The syntax ```df[start_index:end_index]``` will subset rows according to the start and end indices.

In [None]:
# Read data from the file 'sales.xlsx'
sales = 

# Check the created dataframe
# Remember - you should print limited number of entries to check the dataframe


In [None]:
# Selecting first 5 rows of the dataframe
sales[0:5]

In [None]:
# Selecting all the even indices of the dataframe
sales[0::2]

### Selecting Columns

There are two simple ways to select a single column from a dataframe:

-  ```df['column']``` or ```df.column``` return a series
-  ```df[['col_x', 'col_y']]``` returns a dataframe

In [None]:
# Select the column 'Profit' from the dataframe 'Sales'. Output must be in the form of a dataframe.


In [None]:
# Check the type of the sliced data


In [None]:
# Select the column 'Profit' from the dataframe 'Sales'. Output must be in the form of a series.


In [None]:
# Check the type of the sliced data


#### Selecting Multiple Columns 

You can select multiple columns by passing the list of column names inside the ```[]```: ```df[['column_1', 'column_2', 'column_n']]```.

In [None]:
# Selecting multiple columns from a dataframe


### Label and Position Based Indexing: ```df.loc``` and ```df.iloc```

You have seen some ways of selecting rows and columns from dataframes. Let's now see some other ways of indexing dataframes, which pandas recommends, since they are more explicit (and less ambiguous).

There are two main ways of indexing dataframes:
1. Label based indexing using ```df.loc```
2. Position based indexing using ```df.iloc```

Using both the methods, we will do the following indexing operations on a dataframe:
* Selecting single elements/cells
* Selecting single and multiple rows
* Selecting single and multiple columns
* Selecting multiple rows and columns

**Label-based Indexing**

In [None]:
# Select the row with index label as 'Canada'


In [None]:
# Select the row with index label as 'Canada' and 'Western Africa'


In [None]:
# Select the row with index label as 'Canada' and 'Western Africa' along with the columns 'Profit' and 'Sales'


**Position-based Indexing**

In [None]:
# Select the top 5 rows and all the columns starting from second column


In [None]:
# Select all the entries with positive profit


In [None]:
# Count the number of entries in the dataframe with positive profit


In [None]:
# Select all the enries in Latin America and European market where Sales>250000


## Python Libraries - Pandas - Operations on Dataframes

In [None]:
# Checking the dataframe 'sales'


In [None]:
# Converting the Sales amount to Sales in thousand


# Checking the dataframe 'sales'


In [None]:
# Renaming the column: 'Sales' to 'Sales in thousand'


In [None]:
# Checking the dataframe 'sales'


In [None]:
# Help on rename function


In [None]:
# Role of inplace as an attribute


In [None]:
# Creating a new column: 'Positive Profit' using apply function and lambda operation


In [None]:
# Resetting the index


# Setting hierarchical index: Market, Region


# Checking the dataframe


In [None]:
# Fetching the rows under African market


In [None]:
# Fetching the rows under African and European market


In [None]:
# Fetching the rows under Western Europe in European market


In [None]:
# Printing summary of the sales dataframe
