# Pandas Assignment

Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

In [1]:
import pandas as pd

data = [4, 8, 15, 16, 23, 42]
series = pd.Series(data)
print(series)


0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64


In [4]:
type(series)

pandas.core.series.Series

Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the
variable print it.

In [5]:
import pandas as pd

my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
my_series = pd.Series(my_list)
print(my_series)


0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64


Q3. Create a Pandas DataFrame that contains the following data:

Name
Alice
Bob
Claire

Age
25
30
27

Gender
Female
Male
Female

In [6]:
data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
}

df = pd.DataFrame(data)
print(df)

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


In [7]:
type(df)

pandas.core.frame.DataFrame

Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

A DataFrame in pandas is a two-dimensional labeled data structure that can be thought of as a table or a spreadsheet. It consists of columns, where each column represents a different variable or feature, and rows, where each row represents an individual record or observation. In other words, a DataFrame is a collection of pandas Series objects that share a common index.

A pandas Series, on the other hand, is a one-dimensional labeled array that can store data of any type (integer, string, float, etc.). It can be considered as a single column of a DataFrame. Each element in a Series has a corresponding label or index that allows for easy identification and access.

In [8]:
# Creating a Series
my_series = pd.Series([10, 20, 30, 40, 50])

# Creating a DataFrame
my_df = pd.DataFrame({'Numbers': [10, 20, 30, 40, 50]})

print("Series:")
print(my_series)
print()

print("DataFrame:")
print(my_df)

Series:
0    10
1    20
2    30
3    40
4    50
dtype: int64

DataFrame:
   Numbers
0       10
1       20
2       30
3       40
4       50


In [10]:
type(my_series)

pandas.core.series.Series

In [11]:
type(my_df)

pandas.core.frame.DataFrame

Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can
you give an example of when you might use one of these functions?

| Use Case                     | Method                    | Description                                                                                      | Parameters                                                                                      |
|------------------------------|---------------------------|--------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------|
| Data Reading and Writing     | `pd.read_csv()`           | Read a CSV file into a DataFrame.                                                                | `filepath_or_buffer`, `sep`, `header`, `names`, `dtype`, `na_values`, `skiprows`, `nrows`, etc. |
| Data Reading and Writing     | `pd.read_excel()`         | Read an Excel file into a DataFrame.                                                              | `io`, `sheet_name`, `header`, `names`, `dtype`, `na_values`, `skiprows`, `nrows`, etc.          |
| Data Reading and Writing     | `df.to_csv()`             | Write a DataFrame to a CSV file.                                                                  | `path_or_buf`, `sep`, `na_rep`, `columns`, `header`, `index`, `mode`, etc.                      |
| Data Reading and Writing     | `df.to_excel()`           | Write a DataFrame to an Excel file.                                                               | `excel_writer`, `sheet_name`, `header`, `index`, `startrow`, `startcol`, `engine`, etc.         |
| Data Exploration and Manipulation | `df.head()`               | Display the first few rows of a DataFrame.                                                        | `n`                                                                                             |
| Data Exploration and Manipulation | `df.tail()`               | Display the last few rows of a DataFrame.                                                         | `n`                                                                                             |
| Data Exploration and Manipulation | `df.shape`                | Get the dimensions (rows, columns) of the DataFrame.                                              | N/A                                                                                             |
| Data Exploration and Manipulation | `df.info()`               | Display a summary of the DataFrame's structure and data types.                                    | `verbose`, `null_counts`                                                                         |
| Data Exploration and Manipulation | `df.describe()`           | Generate descriptive statistics of the DataFrame.                                                 | `percentiles`, `include`, `exclude`                                                              |
| Data Selection and Filtering | `df[col]` or `df.loc[:, col]` | Select a single column or a list of columns from a DataFrame.                                    | N/A                                                                                             |
| Data Selection and Filtering | `df.iloc[row_index]`      | Select a single row by its index.                                                                 | N/A                                                                                             |
| Data Selection and Filtering | `df.loc[condition]`       | Select rows based on a specific condition.                                                        | N/A                                                                                             |
| Data Selection and Filtering | `df[df[col] > value]`     | Select rows where a column value satisfies a given condition.                                     | N/A                                                                                             |
| Data Aggregation and Grouping | `df.groupby(col)`         | Group rows based on unique values in a column.                                                    | `by`, `axis`, `level`, `sort`, `as_index`, `dropna`                                               |
| Data Aggregation and Grouping | `grouped_df.agg(func)`    | Apply aggregation functions (e.g., sum, mean, count) to grouped data.                            | `func`, `axis`                                                                                  |
| Data Aggregation and Grouping | `df.pivot_table()`        | Create a pivot table based on the DataFrame's values.                                             | `values`, `index`, `columns`, `aggfunc`, `fill_value`                                             |
| Data Aggregation and Grouping | `df.merge()`              | Merge two DataFrames based on a common column.                                                    | `right`, `how`, `on`, `left_on`, `right_on`, `left_index`, `right_index`, etc.                    |
|
| Use Case                     | Method                    | Description                                                                                      | Parameters                                                                                      |
| Data Cleaning and Preprocessing | `df.dropna()`             | Remove rows with missing values.                                                                 | `axis`, `how`, `thresh`, `subset`, `inplace`                                                     |
| Data Cleaning and Preprocessing | `df.fillna(value)`        | Fill missing values with a specified value.                                                       | `value`, `method`, `axis`, `inplace`, `limit`, `downcast`                                        |
| Data Cleaning and Preprocessing | `df.duplicated()`         | Identify and handle duplicated rows.                                                              | `subset`, `keep`, `inplace`                                                                      |
| Data Cleaning and Preprocessing | `df.replace(old, new)`    | Replace specific values in the DataFrame.                                                         | `to_replace`, `value`, `inplace`, `limit`, `regex`, `method`                                    |
| Data Visualization           | `df.plot()`               | Create basic plots (line, bar, scatter, etc.) from the DataFrame.                                 | `x`, `y`, `kind`, `ax`, `subplots`, `layout`, `title`, `grid`, `legend`, `style`, `colormap`, etc.|
| Data Visualization           | `df.hist()`               | Generate histograms for the DataFrame's columns.                                                  | `column`, `by`, `grid`, `xlabelsize`, `ylabelsize`, `ax`, `sharex`, `sharey`, etc.               |
| Data Visualization           | `df.boxplot()`            | Create box plots to visualize the distribution of data.                                           | `column`, `by`, `ax`, `fontsize`, `grid`, `layout`, `patch_artist`, etc.                         |
| Data Visualization           | `df.plot(kind='pie')`     | Generate a pie chart from the DataFrame.                                                          | `y`, `subplots`, `figsize`, `autopct`, `shadow`, `startangle`, `ax`, `legend`, etc.              |
|------------------------------|---------------------------|--------------------------------------------------------------------


Q6. Which of the following is mutable in nature Series, DataFrame, Panel?

Among the options provided (Series, DataFrame, Panel), both Series and DataFrame are mutable in nature, while Panel is not.

In pandas, mutability refers to the ability to modify an object after its creation.

Series: A pandas Series is mutable, meaning you can modify its values, add or remove elements, or update its index. For example, you can change the values in a Series by assigning new values to specific indices.

DataFrame: Similarly, a pandas DataFrame is mutable. You can modify the values in the DataFrame, add or remove columns, update the index or column labels, and perform various other operations to manipulate the data within the DataFrame.

Panel: On the other hand, the pandas Panel object is not mutable. Panels have been deprecated in recent versions of pandas, and the recommended practice is to use MultiIndex DataFrames to represent similar structured data instead.

So, in summary:

Series and DataFrame are mutable in nature.
Panel is not mutable and has been deprecated.
Note: This information is based on the current version of pandas (as of my knowledge cutoff in September 2021). It's always recommended to refer to the pandas documentation for the most up-to-date information.

Q7. Create a DataFrame using multiple Series. Explain with an example.

In [12]:
# Creating Series for Name, Age, and Gender
name_series = pd.Series(['Alice', 'Bob', 'Claire'])
age_series = pd.Series([25, 30, 27])
gender_series = pd.Series(['Female', 'Male', 'Female'])

# Creating a DataFrame using the Series
data = {'Name': name_series, 'Age': age_series, 'Gender': gender_series}
df = pd.DataFrame(data)

In [13]:
print(df)

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


In the code above, three separate Series objects are created: name_series, age_series, and gender_series. Each Series represents a column in the desired DataFrame.

Next, a dictionary named data is created, where the keys are the column names ('Name', 'Age', 'Gender') and the values are the corresponding Series.

Finally, the pd.DataFrame() function is used to convert the dictionary into a DataFrame, which is stored in the variable df. The resulting DataFrame contains the columns 'Name', 'Age', and 'Gender', with the data from the respective Series.

This example demonstrates how multiple Series can be combined to create a DataFrame, allowing you to organize and analyze data in a tabular format.