
# **Introduction to Pandas in Python**

# Python Pandas - Series

###**Creating a series**

Pandas is a popular library for data manipulation in Python, and it provides various ways to create a Series, which is a one-dimensional labeled array. Here are different ways to create a Series in Python using Pandas:

**From a List or NumPy Array:**

You can create a Series from a Python list or a NumPy array. By default, Pandas will assign integer labels as the index.

In [None]:
import pandas as pd

data = [1, 2, 3, 4, 5]
series_from_list = pd.Series(data)
series_from_list

0    1
1    2
2    3
3    4
4    5
dtype: int64

**Specifying Custom Index:**

You can specify custom labels (index) for the Series by providing an index parameter.

In [None]:
data = [1, 2, 3, 4, 5]
custom_index = ['a', 'b', 'c', 'd', 'e']
series_with_custom_index = pd.Series(data, index=custom_index)
series_with_custom_index

a    1
b    2
c    3
d    4
e    5
dtype: int64

**From a Dictionary:**

You can create a Series from a Python dictionary where the keys become the index labels.

In [None]:
data = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
series_from_dict = pd.Series(data)
series_from_dict

a    1
b    2
c    3
d    4
e    5
dtype: int64

**From a Scalar Value:**

You can create a Series with a specified length, all containing the same scalar value

In [None]:
scalar_value = 0
series_from_scalar = pd.Series(scalar_value, index=['a', 'b', 'c'])
series_from_scalar

a    0
b    0
c    0
dtype: int64

**Using pd.Series Constructor with Data and Index Parameters:**

You can create a Series using the pd.Series constructor, specifying both data and index.

In [None]:
data = [10, 20, 30]
custom_index = ['x', 'y', 'z']
series_with_constructor = pd.Series(data=data, index=custom_index)
series_with_constructor

x    10
y    20
z    30
dtype: int64

## Accessing Data from Series with Position

You can access data from a Pandas Series using positions (integer-based indexing) with the .iloc[] indexer

In [None]:
import pandas as pd

# Create a sample Series
data = [10, 20, 30, 40, 50]
custom_index = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data, index=custom_index)
series

a    10
b    20
c    30
d    40
e    50
dtype: int64

In [None]:
# Accessing data by position using .iloc[]
# .iloc[] takes an integer or a slice of integers as input
# Single position:
value_at_position = series.iloc[2]  # Access the third element (30)
print("Value at position 2:", value_at_position)

Value at position 2: 30


In [None]:
# Slicing:
slice_of_series = series.iloc[1:4]  # Access elements from position 1 to 3
print("Slice of Series:")
print(slice_of_series)

Slice of Series:
b    20
c    30
d    40
dtype: int64


In [None]:
# Accessing multiple positions:
multiple_positions = series.iloc[[0, 3, 4]]  # Access elements at positions 0, 3, and 4
print("Multiple positions:")
print(multiple_positions)

Multiple positions:
a    10
d    40
e    50
dtype: int64


# **For practice:**
### **Creating Series:**

1. Create a Pandas Series from a list of integers [1, 2, 3, 4, 5].

2. Create a Series from a Python dictionary where keys are countries and values are their populations.

3. Generate a Series containing the first 10 even numbers.

4. Create a Series of your favorite colors.

5. Create a Series with labels as days of the week and values as the corresponding temperatures.

6. Create a pandas Series from a list.

7. Create a pandas Series from a NumPy array.

8. Create a pandas Series with custom index labels.

9. Create a pandas Series with a specific data type (e.g., float).

10. Create a pandas Series with a name.

11. Create a pandas Series with a range of values from 1 to 5.

12. Create a pandas Series from a dictionary.

13. Create a pandas Series with a specific index.

14. Create a pandas Series with missing data (e.g., None values).

15. Create a pandas Series with a date range as an index.

### **Accessing Elements:**

# Create a sample Series
data = [10, 20, 30, 40, 50]
index_labels = ['A', 'B', 'C', 'D', 'E']
series = pd.Series(data, index=index_labels)

Exercise 1: Access the first element of the Series.

Exercise 2: Access the last element of the Series.

Exercise 3: Access an element by label (e.g., 'B').

Exercise 4: Access an element by integer position (e.g., the 3rd element).

Exercise 5: Access multiple elements by label using a list (e.g., 'A' and 'C').

Exercise 6: Access elements using boolean indexing for values greater than 30.

Exercise 7: Access elements using integer slicing (e.g., from the 2nd to the 4th element).

Exercise 8: Access elements in reverse order using slicing.

Exercise 9: Access elements with a step of 2 (e.g., every second element) using slicing.

Exercise 10: Access elements based on a condition, such as elements less than 25.

##section 2:

1. Retrieve the third element from a Series of names.

2. Access the value associated with the key 'Canada' in a population Series.

3. Get the last element from a Series of temperatures.

4. Extract the values between the second and fifth positions in a Series of characters.

5. Retrieve the value associated with the label 'Wednesday' from a Series of temperatures.


## iterating through a pandas Series
1.	How can you iterate through a pandas Series to access each element one by one?
2.	How can you iterate through a Series and print each element's value and index label?
3.	What loop construct can you use to iterate through the elements of a Series?
4.	How can you iterate through a Series and apply a function to each element?
5.	Can you iterate through a Series and skip elements that meet a specific condition?
6.	How do you iterate through a Series in reverse order?
7.	How can you iterate through a Series and check if a specific value exists in the Series?
8.	How can you iterate through a Series and count how many elements meet a certain condition?
9.	What is the role of the .iteritems() method in iterating through a Series?
10.	How can you iterate through a Series while also accessing the corresponding index labels?


##modifying elements within a pandas Series:
1.	How can you change the value of a specific element in a pandas Series?
2.	What method can you use to replace multiple elements in a Series with new values?
3.	How can you add a new element with a label to an existing Series?
4.	What is the process for removing a specific element from a Series?
5.	How do you update elements in a Series based on a certain condition or criteria?
6.	What method can you use to fill missing values in a Series with a specific value?
7.	How can you change the data type of elements within a Series?
8.	What is the purpose of the .apply() method in modifying Series elements?
9.	How do you rename the index labels of a Series?
10.	What technique can you use to sort the elements in a Series in ascending or descending order?


##arithmetic operations on pandas Series:

1.	How can you add a constant value to all elements in a pandas Series?
2.	What method can you use to subtract one Series from another?
3.	How do you multiply each element in a Series by a specific value?
4.	How can you divide a Series by another Series element-wise?
5.	What is the operation to calculate the square of each element in a Series?
6.	How do you find the absolute values of all elements in a Series?
7.	What method can you use to calculate the exponential values of elements in a Series?
8.	How can you apply a custom function to perform arithmetic operations on a Series?
9.	What operation can you use to calculate the cumulative sum of elements in a Series?
10.	How do you find the minimum and maximum values in a Series using arithmetic operations?


##data aggregation functions in pandas Series:
1.	How can you find the sum of all elements in a pandas Series?
2.	What method can you use to calculate the mean (average) of the elements in a Series?
3.	How do you find the minimum and maximum values in a Series?
4.	What is the operation for calculating the median of a Series?
5.	How can you determine the number of non-null elements in a Series?
6.	What method can you use to compute the standard deviation of a Series?
7.	How do you find the indices (labels) of the top N largest elements in a Series?
8.	What is the operation for counting the occurrences of unique elements in a Series?
9.	How can you calculate the product of all elements in a Series?
10.	What method can you use to compute a summary of statistical information for a Series, including quartiles and more?


## data manipulation functions in pandas Series:

1.	How can you sort the elements in a pandas Series in ascending order?
2.	What is the method for sorting a Series in descending order?
3.	How do you remove duplicate values from a Series?
4.	What function can you use to reindex a Series and add missing index labels?
5.	How can you filter a Series to select elements greater than a specific value?
6.	What is the operation for mapping values in a Series to new values using a dictionary?
7.	How can you replace specific values in a Series with new values?
8.	What method allows you to drop specific elements from a Series by index label?
9.	How do you concatenate two or more Series to create a new Series?
10.	What operation can you use to reset the index of a Series and remove the existing index labels?


##time-related functions and operations with pandas Series:

1.	How can you create a pandas Series with a time-based index using a date range?
2.	What method can you use to extract the year component from a Series with a datetime index?
3.	How do you extract the month component from a time-based Series?
4.	What operation allows you to add a specific number of days to a Series with a datetime index?
5.	How can you resample daily time series data to a monthly frequency and calculate the mean for each month?
6.	What method can you use to shift the time index of a Series forward or backward by a specified number of periods?
7.	How do you filter a time-based Series to select data for a specific year (e.g., 2022)?
8.	What is the process for calculating a rolling mean for a time series using a moving window?
9.	How can you set the time-based Series as the index for a pandas DataFrame?
10.	What operation allows you to interpolate missing values in a time-based Series using linear interpolation?


##handling missing data in pandas Series:

1.	How can you identify missing values in a pandas Series?
2.	What method can you use to drop rows with missing values from a Series?
3.	How do you fill missing values in a Series with a specific constant value?
4.	What method can you use to forward-fill missing values in a Series?
5.	How can you backward-fill missing values in a Series?
6.	What operation allows you to perform linear interpolation to fill missing values in a Series?
7.	How do you check if a specific element in a Series is missing (e.g., NaN)?
8.	What is the process for removing duplicate values, including rows with missing data, from a Series?
9.	How can you filter and select elements in a Series based on the absence or presence of missing values?
10.	What method allows you to count the number of missing values in a pandas Series?


In [None]:
import pandas as pd

data = [1, 2, 3, 4, 5]
series = pd.Series(data)

In [None]:
import pandas as pd

data = {'USA': 331002651, 'China': 1439323776, 'India': 1380004385}
series = pd.Series(data)


In [None]:
import pandas as pd

even_numbers = pd.Series(range(2, 21, 2))


In [None]:
import pandas as pd

colors = pd.Series(['Red', 'Blue', 'Green', 'Yellow', 'Purple'])


In [None]:
import pandas as pd

data = {'Monday': 25, 'Tuesday': 28, 'Wednesday': 30, 'Thursday': 29, 'Friday': 27}
temperatures = pd.Series(data)


In [None]:
#accessing data

In [None]:
third_name = series[2]

In [None]:
population_canada = series['Canada']

In [None]:
last_temperature = temperatures.iloc[-1]

In [None]:
subset_characters = series[1:5]

In [None]:
temperature_wednesday = temperatures['Wednesday']

# **Python Pandas - DataFrame**

## Creating Data Frame

**From a Dictionary of Lists or NumPy Arrays:**

You can create a DataFrame from a dictionary where the keys are column names, and the values are lists or NumPy arrays containing the data for each column.

In [None]:
import pandas as pd

data = {'Column1': [1, 2, 3],
        'Column2': ['A', 'B', 'C']}
df = pd.DataFrame(data)
df

Unnamed: 0,Column1,Column2
0,1,A
1,2,B
2,3,C


In [None]:
numeric_data = df.select_dtypes(include=['number'])
numeric_data

Unnamed: 0,Column1
0,1
1,2
2,3


**From a List of Dictionaries:**

You can create a DataFrame from a list of dictionaries where each dictionary represents a row, and keys are column names.

In [None]:
import pandas as pd

data = [{'Column1': 1, 'Column2': 'A'},
        {'Column1': 2, 'Column2': 'B'},
        {'Column1': 3, 'Column2': 'C'}]
df = pd.DataFrame(data)
print(df)

   Column1 Column2
0        1       A
1        2       B
2        3       C


**From a NumPy Array:**

You can create a DataFrame from a NumPy array, where each column in the array corresponds to a column in the DataFrame.

In [None]:
import pandas as pd
import numpy as np

data = np.array([[1, 'A'], [2, 'B'], [3, 'C']])
df = pd.DataFrame(data, columns=['Column1', 'Column2'])
df

Unnamed: 0,Column1,Column2
0,1,A
1,2,B
2,3,C


**From a CSV or Other Data File:**

You can create a DataFrame by reading data from a CSV file or other data sources using pd.read_csv() or similar functions.

In [None]:
# Assuming 'data.csv' contains tabular data
df = pd.read_csv('data.csv')
df.head()

Unnamed: 0,CustomerID,CustomerName,ContactName,Address,City,PostalCode,Country
0,1,Alfreds Futterkiste,Maria Anders,Obere Str. 57,Berlin,12209,Germany
1,2,Ana Trujillo Emparedados y helados,Ana Trujillo,Avda. de la ConstituciÃ³n 2222,MÃ©xico D.F.,5021,Mexico
2,3,Antonio Moreno TaquerÃ­a,Antonio Moreno,Mataderos 2312,MÃ©xico D.F.,5023,Mexico
3,4,Around the Horn,Thomas Hardy,120 Hanover Sq.,London,WA1 1DP,UK
4,5,Berglunds snabbkÃ¶p,Christina Berglund,BerguvsvÃ¤gen 8,LuleÃ¥,S-958 22,Sweden


**From a Dictionary of Series or Pandas DataFrames:**

You can create a DataFrame from a dictionary where the values are Series or other DataFrames. The keys become column names.

In [None]:
import pandas as pd

data = {'Column1': pd.Series([1, 2, 3]),
        'Column2': pd.Series(['A', 'B', 'C'])}
df = pd.DataFrame(data)
df

Unnamed: 0,Column1,Column2
0,1,A
1,2,B
2,3,C


## Accessing Data from Data Frame

**Accessing Columns by Name:**

You can access a specific column of a DataFrame by using square brackets and specifying the column name as a string.

In [None]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35]}
df = pd.DataFrame(data)

# Access the 'Name' column
names = df['Name']
names

0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object

**Accessing Multiple Columns by Name:**

You can access multiple columns by providing a list of column names inside square brackets.

In [None]:
# Access both 'Name' and 'Age' columns
columns = df[['Name', 'Age']]
columns

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,35


**Accessing Rows by Index:**

You can access rows by their index using the .loc[] indexer. You can specify a single index or a range of indices.

In [None]:
# Access the row with index 0
first_row = df.loc[0]
print("first_row :\n",first_row)

print()
# Access multiple rows by index range
rows_1_to_2 = df.loc[1:2]
print("rows_1_to_2 :\n",rows_1_to_2)

first_row :
 Name    Alice
Age        25
Name: 0, dtype: object

rows_1_to_2 :
       Name  Age
1      Bob   30
2  Charlie   35


**Accessing Data by Row and Column:**

You can access a specific data point by specifying both the row and column using .loc[].

In [None]:
# Access data at row 1 and column 'Name'
data_point = df.loc[1, 'Name']
data_point

'Bob'

**Accessing Data by Position:**

You can access data by row and column positions using .iloc[].

In [None]:
# Access data at row 1 and column 0 (zero-based indexing)
data_point = df.iloc[1, 0]
data_point

'Bob'

**Boolean Indexing:**

You can filter rows based on a condition using boolean indexing.

In [None]:
# Filter rows where 'Age' is greater than 30
filtered_rows = df[df['Age'] > 30]
filtered_rows

Unnamed: 0,Name,Age
2,Charlie,35


# Python Pandas - Basic Functionality

In [None]:
# Assuming 'data.csv' contains tabular data
import pandas as pd
df = pd.read_csv('/content/data.csv')
df

Unnamed: 0,CustomerID,CustomerName,ContactName,Address,City,PostalCode,Country
0,1,Alfreds Futterkiste,Maria Anders,Obere Str. 57,Berlin,12209,Germany
1,2,Ana Trujillo Emparedados y helados,Ana Trujillo,Avda. de la ConstituciÃ³n 2222,MÃ©xico D.F.,5021,Mexico
2,3,Antonio Moreno TaquerÃ­a,Antonio Moreno,Mataderos 2312,MÃ©xico D.F.,5023,Mexico
3,4,Around the Horn,Thomas Hardy,120 Hanover Sq.,London,WA1 1DP,UK
4,5,Berglunds snabbkÃ¶p,Christina Berglund,BerguvsvÃ¤gen 8,LuleÃ¥,S-958 22,Sweden
...,...,...,...,...,...,...,...
86,87,Wartian Herkku,Pirkko Koskitalo,Torikatu 38,Oulu,90110,Finland
87,88,Wellington Importadora,Paula Parente,"Rua do Mercado, 12",Resende,08737-363,Brazil
88,89,White Clover Markets,Karl Jablonski,305 - 14th Ave. S. Suite 3B,Seattle,98128,USA
89,90,Wilman Kala,Matti Karttunen,Keskuskatu 45,Helsinki,21240,Finland


## **Viewing Data:**

You can use various methods to view and inspect your DataFrame:

In [None]:
df.head()

Unnamed: 0,CustomerID,CustomerName,ContactName,Address,City,PostalCode,Country
0,1,Alfreds Futterkiste,Maria Anders,Obere Str. 57,Berlin,12209,Germany
1,2,Ana Trujillo Emparedados y helados,Ana Trujillo,Avda. de la ConstituciÃ³n 2222,MÃ©xico D.F.,5021,Mexico
2,3,Antonio Moreno TaquerÃ­a,Antonio Moreno,Mataderos 2312,MÃ©xico D.F.,5023,Mexico
3,4,Around the Horn,Thomas Hardy,120 Hanover Sq.,London,WA1 1DP,UK
4,5,Berglunds snabbkÃ¶p,Christina Berglund,BerguvsvÃ¤gen 8,LuleÃ¥,S-958 22,Sweden


In [None]:
df.tail()

Unnamed: 0,CustomerID,CustomerName,ContactName,Address,City,PostalCode,Country
86,87,Wartian Herkku,Pirkko Koskitalo,Torikatu 38,Oulu,90110,Finland
87,88,Wellington Importadora,Paula Parente,"Rua do Mercado, 12",Resende,08737-363,Brazil
88,89,White Clover Markets,Karl Jablonski,305 - 14th Ave. S. Suite 3B,Seattle,98128,USA
89,90,Wilman Kala,Matti Karttunen,Keskuskatu 45,Helsinki,21240,Finland
90,91,Wolski,Zbyszek,ul. Filtrowa 68,Walla,01-012,Poland


In [None]:
df.shape

(91, 7)

In [None]:
print(df.columns)

Index(['CustomerID', 'CustomerName', 'ContactName', 'Address', 'City',
       'PostalCode', 'Country'],
      dtype='object')


In [None]:
print(df.index)

RangeIndex(start=0, stop=91, step=1)


In [None]:
df.values

array([[1, 'Alfreds Futterkiste', 'Maria Anders', 'Obere Str. 57',
        'Berlin', '12209', 'Germany'],
       [2, 'Ana Trujillo Emparedados y helados', 'Ana Trujillo',
        'Avda. de la ConstituciÃ³n 2222', 'MÃ©xico D.F.', '5021',
        'Mexico'],
       [3, 'Antonio Moreno TaquerÃ\xada', 'Antonio Moreno',
        'Mataderos 2312', 'MÃ©xico D.F.', '5023', 'Mexico'],
       [4, 'Around the Horn', 'Thomas Hardy', '120 Hanover Sq.',
        'London', 'WA1 1DP', 'UK'],
       [5, 'Berglunds snabbkÃ¶p', 'Christina Berglund',
        'BerguvsvÃ¤gen 8', 'LuleÃ¥', 'S-958 22', 'Sweden'],
       [6, 'Blauer See Delikatessen', 'Hanna Moos', 'Forsterstr. 57',
        'Mannheim', '68306', 'Germany'],
       [7, 'Blondel pÃ¨re et fils', 'FrÃ©dÃ©rique Citeaux',
        '24, place KlÃ©ber', 'Strasbourg', '67000', 'France'],
       [8, 'BÃ³lido Comidas preparadas', 'MartÃ\xadn Sommer',
        'C/ Araquil, 67', 'Madrid', '28023', 'Spain'],
       [9, "Bon app'", 'Laurence Lebihans', '12, rue de

In [None]:
print(df.dtypes)

CustomerID       int64
CustomerName    object
ContactName     object
Address         object
City            object
PostalCode      object
Country         object
dtype: object


In [None]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 91 entries, 0 to 90
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   CustomerID    91 non-null     int64 
 1   CustomerName  91 non-null     object
 2   ContactName   91 non-null     object
 3   Address       91 non-null     object
 4   City          91 non-null     object
 5   PostalCode    90 non-null     object
 6   Country       91 non-null     object
dtypes: int64(1), object(6)
memory usage: 5.1+ KB
None


In [None]:
print(df.describe())

       CustomerID
count    91.00000
mean     46.00000
std      26.41338
min       1.00000
25%      23.50000
50%      46.00000
75%      68.50000
max      91.00000


## **Filtering Data:**

In [None]:
London_customers = df[df['City'] == 'London']
London_customers

Unnamed: 0,CustomerID,CustomerName,ContactName,Address,City,PostalCode,Country
3,4,Around the Horn,Thomas Hardy,120 Hanover Sq.,London,WA1 1DP,UK
10,11,B's Beverages,Victoria Ashworth,Fauntleroy Circus,London,EC2 5NT,UK
15,16,Consolidated Holdings,Elizabeth Brown,Berkeley Gardens 12 Brewery,London,WX1 6LT,UK
18,19,Eastern Connection,Ann Devon,35 King George,London,WX3 6FW,UK
52,53,North/South,Simon Crowther,South House 300 Queensbridge,London,SW7 1RZ,UK
71,72,Seven Seas Imports,Hari Kumar,90 Wadhurst Rd.,London,OX15 4NB,UK


In [None]:
usa_customers = df[df['Country'] == 'USA']
usa_customers

Unnamed: 0,CustomerID,CustomerName,ContactName,Address,City,PostalCode,Country
31,32,Great Lakes Food Market,Howard Snyder,2732 Baker Blvd.,Eugene,97403,USA
35,36,Hungry Coyote Import Store,Yoshi Latimer,City Center Plaza 516 Main St.,Elgin,97827,USA
42,43,Lazy K Kountry Store,John Steel,12 Orchestra Terrace,Walla Walla,99362,USA
44,45,Let's Stop N Shop,Jaime Yorres,87 Polk St. Suite 5,San Francisco,94117,USA
47,48,Lonesome Pine Restaurant,Fran Wilson,89 Chiaroscuro Rd.,Portland,97219,USA
54,55,Old World Delicatessen,Rene Phillips,2743 Bering St.,Anchorage,99508,USA
64,65,Rattlesnake Canyon Grocery,Paula Wilson,2817 Milton Dr.,Albuquerque,87110,USA
70,71,Save-a-lot Markets,Jose Pavarotti,187 Suffolk Ln.,Boise,83720,USA
74,75,Split Rail Beer & Ale,Art Braunschweiger,P.O. Box 555,Lander,82520,USA
76,77,The Big Cheese,Liz Nixon,89 Jefferson Way Suite 2,Portland,97201,USA


## **Sorting Data:**

In [None]:
sorted_df = df.sort_values(by='CustomerName')
sorted_df

Unnamed: 0,CustomerID,CustomerName,ContactName,Address,City,PostalCode,Country
0,1,Alfreds Futterkiste,Maria Anders,Obere Str. 57,Berlin,12209,Germany
1,2,Ana Trujillo Emparedados y helados,Ana Trujillo,Avda. de la ConstituciÃ³n 2222,MÃ©xico D.F.,5021,Mexico
2,3,Antonio Moreno TaquerÃ­a,Antonio Moreno,Mataderos 2312,MÃ©xico D.F.,5023,Mexico
3,4,Around the Horn,Thomas Hardy,120 Hanover Sq.,London,WA1 1DP,UK
10,11,B's Beverages,Victoria Ashworth,Fauntleroy Circus,London,EC2 5NT,UK
...,...,...,...,...,...,...,...
86,87,Wartian Herkku,Pirkko Koskitalo,Torikatu 38,Oulu,90110,Finland
87,88,Wellington Importadora,Paula Parente,"Rua do Mercado, 12",Resende,08737-363,Brazil
88,89,White Clover Markets,Karl Jablonski,305 - 14th Ave. S. Suite 3B,Seattle,98128,USA
89,90,Wilman Kala,Matti Karttunen,Keskuskatu 45,Helsinki,21240,Finland


In [None]:
#Sorting by multiple columns
sorted_df = df.sort_values(by=['Country', 'City'])
sorted_df

Unnamed: 0,CustomerID,CustomerName,ContactName,Address,City,PostalCode,Country
11,12,Cactus Comidas para llevar,Patricio Simpson,Cerrito 333,Buenos Aires,1010,Argentina
53,54,OcÃ©ano AtlÃ¡ntico Ltda.,Yvonne Moncada,Ing. Gustavo Moncada 8585 Piso 20-A,Buenos Aires,1010,Argentina
63,64,Rancho grande,Sergio GutiÃ©rrez,Av. del Libertador 900,Buenos Aires,1010,Argentina
19,20,Ernst Handel,Roland Mendel,Kirchgasse 6,Graz,8010,Austria
58,59,Piccolo und mehr,Georg Pipps,Geislweg 14,Salzburg,5020,Austria
...,...,...,...,...,...,...,...
42,43,Lazy K Kountry Store,John Steel,12 Orchestra Terrace,Walla Walla,99362,USA
45,46,LILA-Supermercado,Carlos GonzÃ¡lez,Carrera 52 con Ave. BolÃ­var #65-98 Llano Largo,Barquisimeto,3508,Venezuela
32,33,GROSELLA-Restaurante,Manuel Pereira,5Âª Ave. Los Palos Grandes,Caracas,1081,Venezuela
46,47,LINO-Delicateses,Felipe Izquierdo,Ave. 5 de Mayo Porlamar,I. de Margarita,4980,Venezuela


## **unique() Method**

In [None]:
unique_categories = df['Country'].unique()
print(unique_categories)

['Germany' 'Mexico' 'UK' 'Sweden' 'France' 'Spain' 'Canada' 'Argentina'
 'Switzerland' 'Brazil' 'Austria' 'Italy' 'Portugal' 'USA' 'Venezuela'
 'Ireland' 'Belgium' 'Norway' 'Denmark' 'Finland' 'Poland']


In [None]:
CITY_COLUMNS = df['City'].unique()
print(CITY_COLUMNS)

['Berlin' 'MÃ©xico D.F.' 'London' 'LuleÃ¥' 'Mannheim' 'Strasbourg'
 'Madrid' 'Marseille' 'Tsawassen' 'Buenos Aires' 'Bern' 'SÃ£o Paulo'
 'Aachen' 'Nantes' 'Graz' 'Lille' 'BrÃ¤cke' 'MÃ¼nchen' 'Torino' 'Lisboa'
 'Barcelona' 'Sevilla' 'Campinas' 'Eugene' 'Caracas' 'Rio de Janeiro'
 'San CristÃ³bal' 'Elgin' 'Cork' 'Cowes' 'Brandenburg' 'Versailles'
 'Toulouse' 'Vancouver' 'Walla Walla' 'Frankfurt a.M.' 'San Francisco'
 'Barquisimeto' 'I. de Margarita' 'Portland' 'Bergamo' 'Bruxelles'
 'MontrÃ©al' 'Leipzig' 'Anchorage' 'KÃ¶ln' 'Paris' 'Salzburg' 'Cunewalde'
 'Albuquerque' 'Reggio Emilia' 'GenÃ¨ve' 'Stavern' 'Boise' 'KÃ¸benhavn'
 'Lander' 'Charleroi' 'Butte' 'MÃ¼nster' 'Kirkland' 'Ã\x85rhus' 'Lyon'
 'Reims' 'Stuttgart' 'Oulu' 'Resende' 'Seattle' 'Helsinki' 'Walla']


## **Handling Missing Data:**

In [None]:
df.isna().sum()

CustomerID      0
CustomerName    0
ContactName     0
Address         0
City            0
PostalCode      1
Country         0
dtype: int64

In [None]:
df_cleaned = df.dropna()

In [None]:
df_cleaned.isna().sum()

CustomerID      0
CustomerName    0
ContactName     0
Address         0
City            0
PostalCode      0
Country         0
dtype: int64

## Handling Missing Data:
Pandas provides functions to handle missing data, such as dropna(), fillna(), and isna().

In [None]:
# Remove rows with missing values
df.dropna()

# Fill missing values with a specific value
df.fillna(0)

# Check for missing values
df.isna()

## **Exporting Data:**

In [None]:
df.to_csv('customer_data.csv', index=False)

## **Aggregation:**

In [None]:
import pandas as pd

# Sample DataFrame
data = {
    'CustomerID': [101, 102, 103, 104, 105],
    'CustomerName': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [28, 35, 22, 45, 30],
    'Country': ['USA', 'Canada', 'USA', 'UK', 'Canada']
}

df = pd.DataFrame(data)
df

Unnamed: 0,CustomerID,CustomerName,Age,Country
0,101,Alice,28,USA
1,102,Bob,35,Canada
2,103,Charlie,22,USA
3,104,David,45,UK
4,105,Eve,30,Canada


In [None]:
# Group customers by 'Country' and calculate the mean of 'Age' within each group
country_avg_age = df.groupby('Country')['Age'].mean()
print(country_avg_age)

Country
Canada    32.5
UK        45.0
USA       25.0
Name: Age, dtype: float64


## Pandas - Descriptive Statistics

In [None]:
import pandas as pd
import numpy as np

#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
   'Lee','David','Gasper','Betina','Andres']),
   'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
df

Unnamed: 0,Name,Age,Rating
0,Tom,25,4.23
1,James,26,3.24
2,Ricky,25,3.98
3,Vin,23,2.56
4,Steve,30,3.2
5,Smith,29,4.6
6,Jack,23,3.8
7,Lee,34,3.78
8,David,40,2.98
9,Gasper,30,4.8


In [None]:
df.sum()

Name      TomJamesRickyVinSteveSmithJackLeeDavidGasperBe...
Age                                                     382
Rating                                                44.92
dtype: object

In [None]:
df.sum(1)

  df.sum(1)


0     29.23
1     29.24
2     28.98
3     25.56
4     33.20
5     33.60
6     26.80
7     37.78
8     42.98
9     34.80
10    55.10
11    49.65
dtype: float64

In [None]:
df.mean()

  df.mean()


Age       31.833333
Rating     3.743333
dtype: float64

In [None]:
df.std()

  df.std()


Age       9.232682
Rating    0.661628
dtype: float64

In [None]:
df.describe()

Unnamed: 0,Age,Rating
count,12.0,12.0
mean,31.833333,3.743333
std,9.232682,0.661628
min,23.0,2.56
25%,25.0,3.23
50%,29.5,3.79
75%,35.5,4.1325
max,51.0,4.8


In [None]:
df.describe(include=['object'])

Unnamed: 0,Name
count,12
unique,12
top,Tom
freq,1


In [None]:
df. describe(include='all')

Unnamed: 0,Name,Age,Rating
count,12,12.0,12.0
unique,12,,
top,Tom,,
freq,1,,
mean,,31.833333,3.743333
std,,9.232682,0.661628
min,,23.0,2.56
25%,,25.0,3.23
50%,,29.5,3.79
75%,,35.5,4.1325


# **for practice:**

1. **Write a Pandas program to create a dataframe from a dictionary and display it.**

Sample data: {'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]}
2. **Write a Pandas program to create and display a DataFrame from a specified dictionary data which has the index labels.**

Sample Python dictionary data and list labels:

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

3.  **Write a Pandas program to display a summary of the basic information about a specified DataFrame and its data.**
Sample Python dictionary data and list labels:

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

4.** Write a Pandas program to get the first 3 rows of a given DataFrame.**
Sample Python dictionary data and list labels:

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
5. **Write a Pandas program to select the 'name' and 'score' columns from the following DataFrame.**

Sample Python dictionary data and list labels:

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

6.** Write a Pandas program to select the specified columns and rows from a given data frame.**

Sample Python dictionary data and list labels:

Select 'name' and 'score' columns in rows 1, 3, 5, 6 from the following data frame.

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],



'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

7. **Write a Pandas program to select the rows where the number of attempts in the examination is greater than 2.**

Sample Python dictionary data and list labels:

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

8. **Write a Pandas program to count the number of rows and columns of a DataFrame.**


Sample Python dictionary data and list labels:

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

9.** Write a Pandas program to select the rows where the score is missing, i.e. is NaN.**

Sample Python dictionary data and list labels:

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

10. **Write a Pandas program to select the rows the score is between 15 and 20 (inclusive).**

Sample Python dictionary data and list labels:

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

In [None]:
import pandas as pd
df = pd.DataFrame({'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]});
print(df)

    X   Y   Z
0  78  84  86
1  85  94  97
2  96  89  96
3  80  83  72
4  86  86  83


In [None]:
import pandas as pd
import numpy as np

exam_data  = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
        'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
        'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(exam_data , index=labels)
print(df)

        name  score  attempts qualify
a  Anastasia   12.5         1     yes
b       Dima    9.0         3      no
c  Katherine   16.5         2     yes
d      James    NaN         3      no
e      Emily    9.0         2      no
f    Michael   20.0         3     yes
g    Matthew   14.5         1     yes
h      Laura    NaN         1      no
i      Kevin    8.0         2      no
j      Jonas   19.0         1     yes


In [None]:
import pandas as pd
import numpy as np

exam_data  = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
        'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
        'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(exam_data , index=labels)
print("Summary of the basic information about this DataFrame and its data:")
print(df.info())

Summary of the basic information about this DataFrame and its data:
<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, a to j
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   name      10 non-null     object 
 1   score     8 non-null      float64
 2   attempts  10 non-null     int64  
 3   qualify   10 non-null     object 
dtypes: float64(1), int64(1), object(2)
memory usage: 400.0+ bytes
None


In [None]:
import pandas as pd
import numpy as np

exam_data  = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
        'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
        'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(exam_data , index=labels)
print("First three rows of the data frame:")
print(df.iloc[:3])

First three rows of the data frame:
        name  score  attempts qualify
a  Anastasia   12.5         1     yes
b       Dima    9.0         3      no
c  Katherine   16.5         2     yes


In [None]:
import pandas as pd
import numpy as np

exam_data  = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
        'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
        'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(exam_data , index=labels)
print("Select specific columns:")
print(df[['name', 'score']])

Select specific columns:
        name  score
a  Anastasia   12.5
b       Dima    9.0
c  Katherine   16.5
d      James    NaN
e      Emily    9.0
f    Michael   20.0
g    Matthew   14.5
h      Laura    NaN
i      Kevin    8.0
j      Jonas   19.0


In [None]:
import pandas as pd
import numpy as np

exam_data  = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
        'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
        'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(exam_data , index=labels)
print("Select specific columns and rows:")
print(df.iloc[[1, 3, 5, 6], [1, 3]])

Select specific columns and rows:
   score qualify
b    9.0      no
d    NaN      no
f   20.0     yes
g   14.5     yes


In [None]:
import pandas as pd
import numpy as np

exam_data  = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
        'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
        'attempts' : [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(exam_data , index=labels)
print("Number of attempts in the examination is greater than 2:")
print(df[df['attempts'] > 2])

Number of attempts in the examination is greater than 2:
      name  score  attempts qualify
b     Dima    9.0         3      no
d    James    NaN         3      no
f  Michael   20.0         3     yes


In [None]:
import pandas as pd
import numpy as np
exam_data  = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
        'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
        'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(exam_data , index=labels)
total_rows=len(df.axes[0])
total_cols=len(df.axes[1])
print("Number of Rows: "+str(total_rows))
print("Number of Columns: "+str(total_cols))

Number of Rows: 10
Number of Columns: 4


In [None]:
import pandas as pd
import numpy as np
exam_data  = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
        'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
        'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(exam_data , index=labels)
print("Rows where score is missing:")
print(df[df['score'].isnull()])

Rows where score is missing:
    name  score  attempts qualify
d  James    NaN         3      no
h  Laura    NaN         1      no


In [None]:
import pandas as pd
import numpy as np
exam_data  = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
        'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
        'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(exam_data , index=labels)
print("Rows where score between 15 and 20 (inclusive):")
print(df[df['score'].between(15, 20)])

Rows where score between 15 and 20 (inclusive):
        name  score  attempts qualify
c  Katherine   16.5         2     yes
f    Michael   20.0         3     yes
j      Jonas   19.0         1     yes


dataframe creation

1.	How can you create a pandas DataFrame from a dictionary where keys become column names and values become column data?
2.	What method allows you to create a DataFrame from a list of lists with specified column names?
3.	How can you create a DataFrame from a NumPy array with custom row and column labels?
4.	What is the process for creating a DataFrame from a CSV file stored on your local machine?
5.	How can you create a DataFrame from a CSV file hosted on a remote URL?
6.	What method can you use to create an empty DataFrame with predefined column names?
7.	How do you create a DataFrame from a SQL database table using a SQL query?
8.	What operation allows you to create a DataFrame by concatenating two or more existing DataFrames vertically?
9.	How can you create a DataFrame by joining two DataFrames based on a common column or index?
10.	What method allows you to generate a DataFrame with random data for testing and experimentation?
data exploration on data frame

1.	How can you check the first few rows of a DataFrame to get an overview of the data?
2.	What method allows you to check the basic statistics (e.g., count, mean, min, max) for numerical columns in a DataFrame?
3.	How do you identify and count unique values in a specific column of a DataFrame?
4.	What method can you use to sort a DataFrame by values in a specific column in ascending order?
5.	How can you filter a DataFrame to select rows that meet a specific condition (e.g., values greater than a certain threshold)?
6.	What operation allows you to group data in a DataFrame by a specific column and calculate statistics (e.g., mean, sum) for each group?
7.	How do you pivot a DataFrame to transform data from long format to wide format?
8.	What is the process for merging or joining two DataFrames based on a common column or key?
9.	How can you create a pivot table to summarize data and perform aggregation in a tabular form?
10.	What method allows you to plot and visualize data in a DataFrame, such as creating a histogram or scatter plot?

data selection and filtering on data frame

1.	How can you select a specific column in a pandas DataFrame?
2.	What method allows you to select multiple columns from a DataFrame?
3.	How do you select rows that meet a specific condition in a DataFrame (e.g., values greater than 50)?
4.	What is the process for selecting rows and columns using integer-based indexing in a DataFrame?
5.	How can you filter a DataFrame to select rows where a specific column's values are not null?
6.	What method allows you to filter rows based on multiple conditions (e.g., values in two columns)?
7.	How do you select rows with a specific value in a categorical column of a DataFrame?
8.	What operation allows you to use string methods to filter rows based on text content in a column?
9.	How can you apply a custom function to filter rows in a DataFrame based on a specific criterion?
10.	What method allows you to sample a random subset of rows from a DataFrame for analysis or testing?

data manipulation on data frame
1.	How can you add a new column to a DataFrame with calculated values based on existing columns?
2.	What is the process for renaming columns in a DataFrame to make them more descriptive?
3.	How do you drop a specific column from a DataFrame?
4.	What method allows you to apply a function to each element in a specific column and create a new column with the results?
5.	How can you sort a DataFrame in ascending order based on values in one or more columns?
6.	What operation allows you to change the data type of a specific column in a DataFrame?
7.	How do you pivot a DataFrame to transform it from wide format to long format?
8.	What is the process for merging or joining two DataFrames based on a common column or key?
9.	How can you apply a filter to select rows in a DataFrame based on a specific condition?
10.	What method allows you to create a summary table that groups and aggregates data in a DataFrame, often called a pivot table?

data grouping and aggregation on data frame

1.	How do you group data in a pandas DataFrame based on a specific column and calculate the sum of each group's values?
2.	What method allows you to group a DataFrame by multiple columns and calculate the mean value for each group?
3.	How can you group data in a DataFrame by a categorical column and count the number of occurrences for each category?
4.	What is the process for aggregating data in a DataFrame to find the minimum and maximum values for each group?
5.	How do you group data in a DataFrame by a specific column and calculate statistics such as the median and standard deviation for each group?
6.	What method allows you to apply custom aggregation functions to grouped data in a DataFrame?
7.	How can you group a DataFrame by a date or time-based column and calculate the total sum for each time period (e.g., daily, monthly)?
8.	What operation allows you to unstack a grouped DataFrame to pivot it from a hierarchical index to a more straightforward tabular format?
9.	How do you group data in a DataFrame by one or more columns and create a pivot table to display summary information in a structured way?
10.	What method allows you to apply multiple aggregation functions simultaneously to grouped data in a DataFrame, creating a summary with various statistics?

handling missing data on data frame
1.	How do you identify missing values in a pandas DataFrame?
2.	What method allows you to drop rows with missing values in a DataFrame?
3.	How can you fill missing values in a specific column with a constant value?
4.	What operation allows you to forward-fill missing values in a DataFrame?
5.	How do you backward-fill missing values in a DataFrame?
6.	What method can you use to interpolate missing values in a DataFrame using linear interpolation?
7.	How can you replace missing values in a DataFrame with the mean of the respective column?
8.	What is the process for checking if a specific element in a DataFrame is missing (e.g., NaN)?
9.	How do you filter rows in a DataFrame to select those that have missing values in a particular column?
10.	What method allows you to count the number of missing values in each column of a DataFrame?

data cleaning on data frame

1.	How can you identify and handle duplicate rows in a pandas DataFrame?
2.	What method allows you to rename columns in a DataFrame to make them more descriptive?
3.	How do you drop specific columns from a DataFrame to remove unnecessary data?
4.	What operation can you use to change the data type of a column in a DataFrame?
5.	How can you remove leading and trailing whitespaces from text data in a DataFrame?
6.	What is the process for converting a categorical column in a DataFrame to one-hot encoded columns?
7.	How do you handle outliers by filtering or transforming values in a DataFrame?
8.	What method allows you to replace specific values in a column with new values in a DataFrame?
9.	How can you check for and handle missing data (e.g., NaN values) in a DataFrame?
10.	What operation allows you to standardize or normalize numerical data in a DataFrame to have a common scale?

merging and joining on data frame
1.	How can you merge two pandas DataFrames with a common column or key using an inner join?
2.	What method allows you to combine two DataFrames with different columns into a single DataFrame with all columns?
3.	How do you perform a left join to combine two DataFrames, keeping all rows from the left DataFrame and matching rows from the right DataFrame?
4.	What is the process for executing a right join to include all rows from the right DataFrame and matching rows from the left DataFrame?
5.	How can you merge two DataFrames using an outer join to include all rows from both DataFrames, filling in missing values with NaN?
6.	What method allows you to merge DataFrames using a specific column as the key for joining, even if the column names are different in the two DataFrames?
7.	How do you concatenate two DataFrames vertically to stack them on top of each other?
8.	What operation allows you to combine two DataFrames horizontally by adding new columns from one DataFrame to another?
9.	How can you merge two DataFrames using multiple keys or columns for the join operation?
10.	What method allows you to merge DataFrames using an index in one DataFrame as the key for joining with the other DataFrame?

string operations on data frame
1.	How can you extract a specific substring from a column of text data in a DataFrame?
2.	What method allows you to convert all text in a column to lowercase in a DataFrame?
3.	How do you remove leading and trailing whitespaces from text data in a DataFrame?
4.	What operation can you use to concatenate two or more text columns in a DataFrame into a single column?
5.	How can you replace specific substrings or characters in a text column with new values in a DataFrame?
6.	What is the process for counting the occurrences of a specific word or character in a text column of a DataFrame?
7.	How do you split a text column into multiple columns based on a delimiter (e.g., comma) in a DataFrame?
8.	What method allows you to find the length of each string in a text column in a DataFrame?
9.	How can you check if a text column contains a specific substring or character and return a Boolean column indicating the result?
10.	What operation allows you to extract the first or last N characters from each string in a text column of a DataFrame?

datetime functions on data frame

1.	How can you extract the year component from a datetime column in a DataFrame?
2.	What method allows you to extract the month component from a datetime column?
3.	How do you find the day of the week for each date in a datetime column and create a new column with the results?
4.	What operation can you use to calculate the difference in days between two datetime columns in a DataFrame?
5.	How can you filter and select rows in a DataFrame based on a specific date range in a datetime column?
6.	What is the process for resampling daily time series data to a monthly frequency and calculating the mean for each month?
7.	How do you shift the datetime values in a column forward or backward by a specified number of days?
8.	What method allows you to check if a specific date in a datetime column falls on a weekend or a weekday?
9.	How can you create a new column in a DataFrame that represents the time difference between two datetime columns?
10.	What operation allows you to convert a datetime column to a specific timezone in a DataFrame?

descrptive statics on data frame
1.	How can you calculate the mean (average) of a specific numerical column in a DataFrame?
2.	What method allows you to find the median value for a numerical column in a DataFrame?
3.	How do you determine the sum of values in a numerical column in a DataFrame?
4.	What is the process for finding the minimum and maximum values in a numerical column in a DataFrame?
5.	How can you calculate the standard deviation of a numerical column in a DataFrame?
6.	What operation allows you to count the number of unique values in a categorical column in a DataFrame?
7.	How do you create a frequency distribution table for a categorical column in a DataFrame?
8.	What method allows you to find the mode (most common value) for a column in a DataFrame?
9.	How can you calculate the correlation between two numerical columns in a DataFrame?
10.	What operation allows you to generate a summary statistics table for all numerical columns in a DataFrame, including mean, min, max, and more?

