# **Introduction to Pandas**

## __Agenda__

- Fundamentals of Pandas
  * Purpose of Pandas
  * Features of Pandas
- Data Structures
- Introduction to Series
  * Creating and Accessing Pandas Series Using Different Methods
  * Basic Information in Pandas Series
  * Operations and Transformations in Pandas Series
  * Querying a Series

## __1. Fundamentals of Pandas__

Pandas is an open-source library built on top of NumPy and is used for data manipulation.

- It introduces data structures like DataFrame and Series that make working with structured data more efficient.

![link text](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/ADSP_Images/Lesson_04_Working_with_Pandas/1_Introduction_to_Pandas/pandas.png)

### __1.1 Purpose of Pandas__
![link text](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/ADSP_Images/Lesson_04_Working_with_Pandas/1_Introduction_to_Pandas/Purpose_of_Pandas.png)

### __1.2 Features of Pandas__
![link text](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/ADSP_Images/Updated_Images/Lesson_4/4_01/Features_of_Pandas.png)




## __2. Data Structures__
The two main libraries of Pandas data structure are:
![link text](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/ADSP_Images/Lesson_04_Working_with_Pandas/1_Introduction_to_Pandas/Data_Structures.png)

## __3. Introduction to Series__
A Series is a one-dimensional array-like object containing data and labels or index.

It can be created with different data inputs:
![link text](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/ADSP_Images/Updated_Images/Lesson_4/4_01/Introduction_to_Series.png)

### __3.1 Creating and Accessing Pandas Series Using Different Methods:__

In [None]:
import pandas as pd

# Creating a Pandas Series from a list
data = [1, 2, 3, 4, 5]
series = pd.Series(data)

print(series)

0    1
1    2
2    3
3    4
4    5
dtype: int64


In [None]:
# Creating a Pandas Series with a specified index
index = ['a', 'b', 'c', 'd', 'e']
series_with_index = pd.Series(data, index=index)
print(series_with_index)

a    1
b    2
c    3
d    4
e    5
dtype: int64


In [None]:
import pandas as pd

# Creating a Pandas Series from a list
#data = [1, 2, 3, 4, 5]
#series = pd.Series(data)

# Creating a Pandas Series with a specified index
#index = ['a', 'b', 'c', 'd', 'e']
#series_with_index = pd.Series(data, index=index)

# Creating a Pandas Series from a dictionary
data_dict = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
series_from_dict = pd.Series(data_dict)
print(series_from_dict)

# Accessing data in a Series
print(series[2])  # Accessing element at index 2
print(series_with_index['b'])  # Accessing element with index 'b'

a    1
b    2
c    3
d    4
e    5
dtype: int64
3
2


### __3.2 Basic Information in Pandas Series__
These functions collectively help analysts summarize and understand the characteristics of the data, facilitating effective data exploration and analysis.

In [None]:
# Return the first n rows
first_n_rows = series.head(3)
print(first_n_rows)


0    1
1    2
2    3
dtype: int64


In [None]:
# Return the last n rows
last_n_rows = series.tail(3)
print(last_n_rows)

2    3
3    4
4    5
dtype: int64


In [None]:
# Return dimensions (Rows, columns)
dimensions = series.shape
print(dimensions)

(5,)


In [None]:
# Generate descriptive statistics
stats = series.describe()
print(stats)

count    5.000000
mean     3.000000
std      1.581139
min      1.000000
25%      2.000000
50%      3.000000
75%      4.000000
max      5.000000
dtype: float64


In [None]:
import pandas as pd
cat_data = ['House A', 'House B', 'House C', 'House D','House A', 'House B', 'House C', 'House D']
cat_series = pd.Series(cat_data)
print(cat_series)

0    House A
1    House B
2    House C
3    House D
4    House A
5    House B
6    House C
7    House D
dtype: object


In [None]:
# Return unique values
unique_values = cat_series.unique()
print(unique_values)

['House A' 'House B' 'House C' 'House D']


In [None]:
# Return the number of unique values
unique_values = cat_series.nunique() #cardinality of the column
print(unique_values)

4


### __3.3 Operations and Transformations in Pandas Series__
Operations and transformations in Pandas Series are crucial for modifying, enhancing, and cleaning data effectively.

They provide flexibility to adapt data to specific analyses or visualizations, preparing it for meaningful insights and ensuring data quality.

In [None]:
series2 = pd.Series([10, 20, 30, 40, 50])
print(series2)

0    10
1    20
2    30
3    40
4    50
dtype: int64


In [None]:
# Element-wise addition
result_series = series + series2
print(series)
print(series2)
print(result_series)

0    1
1    2
2    3
3    4
4    5
dtype: int64
0    10
1    20
2    30
3    40
4    50
dtype: int64
0    11
1    22
2    33
3    44
4    55
dtype: int64


In [None]:
# Apply a function to each element
squared_series = series2.apply(lambda x: x**2)
print(squared_series)
print(series2)

0     100
1     400
2     900
3    1600
4    2500
dtype: int64
0    10
1    20
2    30
3    40
4    50
dtype: int64


In [None]:
#def calc_square(x):
  #return x**2

In [None]:
# Map values using a dictionary
mapped_series = series.map({1: 'one', 2: 'two', 3: 'three'})
print(series)
print(mapped_series)

0    1
1    2
2    3
3    4
4    5
dtype: int64
0      one
1      two
2    three
3      NaN
4      NaN
dtype: object


In [None]:
# Map values using a dictionary
mapped_series = series.map({1: 'one', 2: 'two', 3: 'three'},na_action=None)
print(series)
print(mapped_series)

0    1
1    2
2    3
3    4
4    5
dtype: int64
0      one
1      two
2    three
3      NaN
4      NaN
dtype: object


In [None]:
new_series = pd.Series([3,2,1,4,5])
print(new_series)

0    3
1    2
2    1
3    4
4    5
dtype: int64


In [None]:
# Sort the Series by values
sorted_series = new_series.sort_values(ignore_index=True)
print(sorted_series)

0    1
1    2
2    3
3    4
4    5
dtype: int64


In [None]:
# Check for missing values
missing_values = series.isnull()
print(missing_values)

0    False
1    False
2    False
3    False
4    False
dtype: bool


In [None]:
series.isnull().sum()

0

In [None]:
# Fill missing values with a specified value
filled_series = series.fillna(0)
print(series)
print(filled_series)

0    1
1    2
2    3
3    4
4    5
dtype: int64
0    1
1    2
2    3
3    4
4    5
dtype: int64


### __3.4 Querying a Series__
Selecting and filtering data based on specific conditions is an essential aspect of querying a Pandas Series.

The following examples illustrate common querying operations that can be applied to a Pandas Series:

In [None]:
# Map values using a dictionary
mapped_series = series.map({1: 3, 2: 4, 3: 5})
print(series)
print(mapped_series)

0    1
1    2
2    3
3    4
4    5
dtype: int64
0    3.0
1    4.0
2    5.0
3    NaN
4    NaN
dtype: float64


In [None]:
mapped_series.isnull()

Unnamed: 0,0
0,False
1,False
2,False
3,True
4,True


In [None]:
mapped_series.isnull().sum()

2

In [None]:
print(mapped_series)
mapped_series = mapped_series.fillna(10)
print(mapped_series)

0    3.0
1    4.0
2    5.0
3    NaN
4    NaN
dtype: float64
0     3.0
1     4.0
2     5.0
3    10.0
4    10.0
dtype: float64


In [None]:
new_dict = {1:11,2:12,3:13,4:14,5:15,6:16,7:17,8:18,9:19}
print(new_dict)

{1: 11, 2: 12, 3: 13, 4: 14, 5: 15, 6: 16, 7: 17, 8: 18, 9: 19}


In [None]:
series_new_dict = pd.Series(new_dict)
print(series_new_dict)

1    11
2    12
3    13
4    14
5    15
6    16
7    17
8    18
9    19
dtype: int64


In [None]:
series_new_dict.head()

Unnamed: 0,0
1,11
2,12
3,13
4,14
5,15


In [None]:
series_new_dict.tail()

Unnamed: 0,0
5,15
6,16
7,17
8,18
9,19


In [None]:
print(series_new_dict[7])

17


In [None]:
# Create a Pandas Series
data = {'a': 10, 'b': 20, 'c': 30, 'd': 40, 'e': 50}
series = pd.Series(data)
print(series)

a    10
b    20
c    30
d    40
e    50
dtype: int64


In [None]:
# Select elements greater than 30
selected_greater_than_30 = series[series > 30]
print(selected_greater_than_30)

d    40
e    50
dtype: int64


In [None]:
# Select elements equal to 20
selected_equal_to_20 = series[series == 20]
print(selected_equal_to_20)

b    20
dtype: int64


In [None]:
# Select elements not equal to 40
selected_not_equal_to_40 = series[series != 40]
print(selected_not_equal_to_40)

a    10
b    20
c    30
e    50
dtype: int64


In [None]:
# Select elements based on multiple conditions
selected_multiple_conditions = series[(series > 20) & (series < 50)]
print(selected_multiple_conditions)

c    30
d    40
dtype: int64


In [None]:
# Select elements based on a list of values
selected_by_list = series[series.isin([20, 40, 60])]
print(selected_by_list)

b    20
d    40
dtype: int64


In [None]:
# Select elements using string methods (if applicable)
string_series = pd.Series(['apple', 'banana', 'cherry', 'date', 'elderberry'])
print(string_series)
selected_by_string_method = string_series[(string_series.str.startswith('ba')) | (string_series.str.startswith('ch'))]
print(selected_by_string_method)

0         apple
1        banana
2        cherry
3          date
4    elderberry
dtype: object
1    banana
2    cherry
dtype: object


In [None]:
print(series)
# Query based on index labels
selected_by_index_labels = series.loc[['c', 'd', 'e']]
print(selected_by_index_labels)

a    10
b    20
c    30
d    40
e    50
dtype: int64
c    30
d    40
e    50
dtype: int64


In [None]:
print(series)

# Query based on numeric position
selected_by_numeric_position = series.iloc[2:4]
print(selected_by_numeric_position)

a    10
b    20
c    30
d    40
e    50
dtype: int64
c    30
d    40
dtype: int64


# __Assisted Practice__

## __Problem Statement:__
Use Pandas Series to analyze sales data for a retail store over a week and draw insights from the data.

### __Dataset:__
__Sample sales data__

sales_data = [120, 150, 130, 170, 160, 180, 140]

days_of_week = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

## __Steps to Perform__

1. Create a Pandas Series for sales data
- Use a list of daily sales figures to create a Pandas Series
- Assign days of the week as the index
2. Access and manipulate sales data
- Access sales data for specific days using index labels
- Calculate total sales for the week
- Identify the day with the highest and lowest sales
   
3. Basic analysis of sales data
- Calculate the average sales for the week
- Determine the days with sales figures significantly different from the average