**Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.**

In [6]:
import pandas as pd
series = pd.Series([4,8,15,16,23,42])
series

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64

**Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the
variable print it.**

In [7]:
my_list = [11,12,13,14,15,16,17,18,19,20]
my_series = pd.Series(my_list)
print(my_series)

0    11
1    12
2    13
3    14
4    15
5    16
6    17
7    18
8    19
9    20
dtype: int64


**Q3. Create a Pandas DataFrame that contains the following data:**
![image.png](attachment:image.png)
**Then, print the DataFrame.**

In [12]:
my_dict = {"Name":["Alice","Bob","Claire"],"Age":[25,30,27],"Gender":["Female","Male","Female"]}
df = pd.DataFrame(my_dict)
df.head()

Unnamed: 0,Name,Age,Gender
0,Alice,25,Female
1,Bob,30,Male
2,Claire,27,Female


**Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.**


In Pandas, a DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure. It is similar to a spreadsheet or a SQL table, where data is organized into rows and columns. Each column in a DataFrame can be thought of as a Pandas Series, which is a one-dimensional labeled array.

Here's how a DataFrame differs from a Series:

**Dimensionality:**

- Series: A Series is a one-dimensional data structure. It can be thought of as a single column or a single row of data with an index. Essentially, it's a labeled array.
- DataFrame: A DataFrame is a two-dimensional data structure that consists of rows and columns. It can hold multiple columns, each of which can be a Series.

**Use Cases:**

- Series: Series is typically used to represent a single variable, such as a column in a dataset. It's useful for performing operations on a single set of data.
- DataFrame: DataFrame is used to store and manipulate structured data where you have multiple variables (columns) related to each other. It's ideal for working with datasets where you need to perform operations on multiple variables together.

**Indexing:**

- Series: A Series has a single index that labels the elements.
- DataFrame: A DataFrame has row and column indices. You can access data using both row and column labels.

In summary, a Series is a one-dimensional labeled array, while a DataFrame is a two-dimensional tabular data structure that can hold multiple Series (columns) together. DataFrame is the more versatile and commonly used data structure in Pandas for working with structured datasets.

In [14]:
#Example
import pandas as pd

# Creating a Series
series_data = pd.Series([10, 20, 30, 40, 50], name='Values')
print("Series:")
print(series_data)

# Creating a DataFrame from a dictionary
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 40],
        'Salary': [50000, 60000, 70000, 80000]}
df = pd.DataFrame(data)
print("\nDataFrame:")
print(df)

"""In this example, series_data is a Series representing a single column of values. 
df is a DataFrame representing a structured dataset with multiple columns (Name, Age, Salary)."""

Series:
0    10
1    20
2    30
3    40
4    50
Name: Values, dtype: int64

DataFrame:
      Name  Age  Salary
0    Alice   25   50000
1      Bob   30   60000
2  Charlie   35   70000
3    David   40   80000


'In this example, series_data is a Series representing a single column of values. \ndf is a DataFrame representing a structured dataset with multiple columns (Name, Age, Salary).'

**Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can
you give an example of when you might use one of these functions?**


Pandas provides a wide range of functions and methods to manipulate data in a DataFrame. Here are some common functions and methods you can use, along with examples of when you might use them:

**Selecting Columns:**
- df['column_name'] or df.column_name: Select a single column.
- df[['col1', 'col2']]: Select multiple columns.
    
**Filtering Rows:**
- df[df['column'] > value]: Filter rows based on a condition.

**Sorting Data:**
- df.sort_values('column_name'): Sort DataFrame by a specific column.
- df.sort_values('column_name', ascending=False): Sort in descending order.

**Grouping and Aggregating:**
- df.groupby('column_name').mean(): Group by a column and calculate the mean of other columns within each group.

**Adding and Dropping Columns:**
- df['new_column'] = ...: Add a new column to the DataFrame.
- df.drop('column_name', axis=1): Drop a column.

**Handling Missing Data:**
- df.dropna(): Remove rows with missing values.
- df.fillna(value): Fill missing values with a specific value.


In [15]:
#Example
df

Unnamed: 0,Name,Age,Salary
0,Alice,25,50000
1,Bob,30,60000
2,Charlie,35,70000
3,David,40,80000


In [19]:
#Selecting Columns:
df['Name']
df[['Age','Salary']]

Unnamed: 0,Age,Salary
0,25,50000
1,30,60000
2,35,70000
3,40,80000


In [20]:
#Filtering Rows:
df[df['Age']>30]

Unnamed: 0,Name,Age,Salary
2,Charlie,35,70000
3,David,40,80000


In [23]:
#Sorting Data:
df.sort_values("Name",ascending=False)

Unnamed: 0,Name,Age,Salary
3,David,40,80000
2,Charlie,35,70000
1,Bob,30,60000
0,Alice,25,50000


In [26]:
#Grouping and Aggregating:
df.groupby('Salary').mean()

  df.groupby('Salary').mean()


Unnamed: 0_level_0,Age
Salary,Unnamed: 1_level_1
50000,25.0
60000,30.0
70000,35.0
80000,40.0


**Q6. Which of the following is mutable in nature Series, DataFrame, Panel?**

Among the options provided (Series, DataFrame, Panel), only the DataFrame is mutable in nature.

- Series: A Series is not mutable. Once created, you cannot change its elements or size. You can, however, create a new Series with modified values.

- DataFrame: A DataFrame is mutable. You can add, remove, or modify columns, rows, and values within a DataFrame.

- Panel: The Panel was a data structure in earlier versions of Pandas for handling three-dimensional data. However, it has been removed in recent versions (Pandas 0.25.0 and later) because it was rarely used, and similar functionality can be achieved using MultiIndex DataFrames.

In summary, if you want a mutable data structure in Pandas for tabular data, you should use the DataFrame.

**Q7. Create a DataFrame using multiple Series. Explain with an example.**

In [27]:
import pandas as pd

# Create multiple Series
series1 = pd.Series([1, 2, 3, 4], name='Column1')
series2 = pd.Series(['A', 'B', 'C', 'D'], name='Column2')

# Combine the Series into a DataFrame
df = pd.DataFrame({'Column1': series1, 'Column2': series2})

# Print the DataFrame
print(df)

   Column1 Column2
0        1       A
1        2       B
2        3       C
3        4       D


In this example:

- We create two Pandas Series, series1 and series2, with different types of data (integer and string values).

- We then create a DataFrame df by passing a dictionary to the pd.DataFrame constructor. Each key-value pair in the dictionary represents a column in the DataFrame. The keys ('Column1' and 'Column2') become the column names, and the values are the Series we created earlier.

- The resulting DataFrame has two columns, 'Column1' and 'Column2', with the data from the respective Series. This is a simple example, but in practice, you can create DataFrames with multiple columns and different data types to represent more complex datasets.