In [31]:
import pandas as pd 
import numpy as np

# Question 1: Which type of data can be used while creating a series object in pandas? 
------------------------------------

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). 
The axis labels are collectively referred to as the index. The basic method to create a Series is to call:
    ``` s = pd.Series(data, index=index) ```
Here, data can be many different things:
- a Python dict
* a Python list
+ an ndarray (numpy array)
* a scalar value (like 5)

### The passed index is a list of axis labels. Thus, this separates into a few cases depending on what data is.
------------------------------------
>    ``` s = pd.Series(data=None, index=None, dtype=None, name=None, copy=None, fastpath=_NoDefault.no_default) ```
> **Parameters:**
> data : Array-like, Iterable, dict, or scalar value.
> Contains data stored in Series. If data is a dict, argument order is maintained.
> 
> * — pandas 2.2.0 documentation

# Question 2: Create a series having the month's number as data and assign name as their index values? 
-----------------------------
## Using a dictionary
1. **Create a Dictionary:** Start by defining a Python dictionary where keys represent the index labels and values represent the data points.
2. **Create a Series:** Use the `pd.Series()` constructor and pass the dictionary as an argument.
3. **Print or Use the Series:** You can now print the Series or perform any desired operations with it.
-----------------------------
## Using a list
1. **Create Lists for Data and Index:** Define separate lists for the data points and the corresponding index labels.
2. **Create a Series:** Use the `pd.Series()` constructor and pass the data list as the first argument and the index list as the index parameter.
3. **Print or Use the Series:** You can now print the Series or use it for further analysis.

In [32]:
# Using a dictionary 
# 1 step
month_data = {
    'January': 1,
    'February': 2,
    'March': 3,
    'April': 4,
    'May': 5,
    'June': 6,
    'July': 7,
    'August': 8,
    'September': 9,
    'October': 10,
    'November': 11,
    'December': 12
}
# 2 step
s = pd.Series(month_data)
# 3 step
print(s)

January       1
February      2
March         3
April         4
May           5
June          6
July          7
August        8
September     9
October      10
November     11
December     12
dtype: int64


In [33]:
# Using a list
# 1 step
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
month_numbers = list(range(1, 13))
# 2 step
s = pd.Series(month_numbers, index=months)
# 3 step
print(s)

January       1
February      2
March         3
April         4
May           5
June          6
July          7
August        8
September     9
October      10
November     11
December     12
dtype: int64


In [34]:
# 1 step
fresh_batch_groups = {
    'MATDAIS': 25,
    'MATMIE': 30,
    'COMIE': 28,
    'COMEC': 32
}

# 2 step
students_series = pd.Series(fresh_batch_groups)

# 3 step
print("Number of students in fresh batch groups:")
print(students_series)

Number of students in fresh batch groups:
MATDAIS    25
MATMIE     30
COMIE      28
COMEC      32
dtype: int64


# Question 4: Write a Pandas program to create and display a DataFrame from a specified dictionary data which has the index labels.
--------------------------------------

    Sample Python dictionary data and list labels: 
    exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
            'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
            'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
            'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
    labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'] 

--------------------------------------
## Step-by-step explanation:
1. **Define the dictionary**: A dictionary named `exam_data` is defined. It contains four key-value pairs:

    + `name`: A list of names.
    - `score`: A list of score.
    * `attempts`: A list indicating the number of attempts.
    + `qualify`: A list indicating whether the person qualifies, with values 'yes' or 

2. **Define the Index Labels**: Another list named `labels` is defined, containing labels for the index of the DataFrame. It has ten elements, one for each row of data.
3. **Create DataFrame**: The `pd.DataFrame()` function is called to create a DataFrame named `df`. It takes the dictionary `exam_data` as its data argument and `labels` as the index argument. This means that the DataFrame will have the provided labels as its index.
4. **Print DataFrame**: Finally, the `df` DataFrame is printed.

In [35]:
# 1 step
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
             'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
             'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
             'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
# 2 step
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'] 

# 3 step
df = pd.DataFrame(exam_data, index=labels)

# 4 step
print(df)

        name  score  attempts qualify
a  Anastasia   12.5         1     yes
b       Dima    9.0         3      no
c  Katherine   16.5         2     yes
d      James    NaN         3      no
e      Emily    9.0         2      no
f    Michael   20.0         3     yes
g    Matthew   14.5         1     yes
h      Laura    NaN         1      no
i      Kevin    8.0         2      no
j      Jonas   19.0         1     yes


# Question 5: Write a Pandas program to select the rows where the number of attempts in the examination is greater than 2.
---------------------------------------
Sample Python dictionary data and list labels:
```
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
            'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
            'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
            'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
```
Expected Output:
```
Number of attempts in the examination is greater than 2:
      name  score  attempts qualify
b     Dima    9.0         3      no
d    James    NaN         3      no
f  Michael   20.0         3     yes
```
--------------------------------------
## Step-by-step explanation:
1. **Define the Data**: We have a dictionary named `exam_data`, which contains information about students names, scores, the number of attempts, and whether they qualify. Additionally, we have a list named `labels` which represents the index labels for the DataFrame.
2. **Create DataFrame**: We create a DataFrame named df using the `pd.DataFrame()` function. This function takes `exam_data` as the data and `labels` as the index. This DataFrame will have rows indexed by the provided labels and columns named after the keys of the `exam_data` dictionary.
3. **Select Rows Based on Condition**: We use boolean indexing to select rows where the number of attempts (`attempts` column) is greater than 2. This is achieved by passing a boolean condition `df['attempts'] > 2` inside square brackets.
4. **Print Filtered DataFrame**: Finally, we print the filtered DataFrame `df_filtered`, which contains only the rows where the number of attempts is greater than 2.

In [37]:
# 1 step
exam_data = {
    'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
    'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
    'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
    'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']
}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

# 2 step
df = pd.DataFrame(exam_data, index=labels)

# 3 step
result = df[df['attempts'] > 2]

# 4 step
print('Number of attempts in the examination is greater than 2:\n', result)

Number of attempts in the examination is greater than 2:
       name  score  attempts qualify
b     Dima    9.0         3      no
d    James    NaN         3      no
f  Michael   20.0         3     yes
