In [1]:
# Q1: Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

import pandas as pd

# create a Pandas Series with the given data
data = [4, 8, 15, 16, 23, 42]
series = pd.Series(data)

# print the series
print(series)


0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64


In [2]:
# Q2: Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the variable print it.
import pandas as pd

# create a list of 10 elements
my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# convert the list to a Pandas Series
my_series = pd.Series(my_list)

# print the series
print(my_series)


0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64


In [3]:
# Q3: Create a Pandas DataFrame that contains the following data:
import pandas as pd

# create a dictionary with the data
data = {"name": ["Alice", "Bob", "Claire"],
        "age": [25, 30, 27],
        "gender": ["female", "male",  "female"]}

# convert the dictionary to a Pandas DataFrame
df = pd.DataFrame(data)

# print the DataFrame
print(df)


     name  age  gender
0   Alice   25  female
1     Bob   30    male
2  Claire   27  female


# Q4: What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

## In pandas, a DataFrame is a 2-dimensional labeled data structure that is used to represent tabular data. It is similar to a spreadsheet or SQL table, where each column can have a different data type (e.g., numeric, string, boolean, etc.) and each row represents an observation or record. A DataFrame can be created using a variety of data sources, such as lists, dictionaries, and other data frames.

+ On the other hand, a pandas Series is a 1-dimensional labeled data structure that represents a single column of data. It is similar to a list or a 1-dimensional array in NumPy, but with additional indexing functionality. Each element in a Series has a label (an index) that can be used to access the element's value.

In [4]:
# Here is an example that demonstrates the difference between a DataFrame and a Series:

import pandas as pd

# create a dictionary with some data
data = {"name": ["Alice", "Bob", "Charlie"],
        "age": [25, 30, 35],
        "gender": ["female", "male", "male"]}

# create a DataFrame from the dictionary
df = pd.DataFrame(data)

# create a Series from the "name" column of the DataFrame
name_series = df["name"]

# print the DataFrame and the Series
print("DataFrame:")
print(df)
print("\nSeries:")
print(name_series)


DataFrame:
      name  age  gender
0    Alice   25  female
1      Bob   30    male
2  Charlie   35    male

Series:
0      Alice
1        Bob
2    Charlie
Name: name, dtype: object


# Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? 
# Can you give an example of when you might use one of these functions?
### Pandas provides a wide range of functions for manipulating data in a DataFrame. Here are some common functions:

1. 'head()' and 'tail()': These functions allow you to view the first or last n rows of a DataFrame, respectively. They are useful for quickly inspecting the data to make sure it was loaded correctly, or to get a sense of what the data looks like.

2. 'describe()': This function provides a summary of the statistics for each column in the DataFrame, such as count, mean, standard deviation, minimum, maximum, and quartiles. It is useful for getting a quick overview of the data and identifying any potential issues, such as missing values or outliers.

3. 'info()': This function provides information about the DataFrame, such as the data types of each column and the number of non-null values. It is useful for checking the data types and making sure they are correct.

4. 'groupby()': This function allows you to group the data in the DataFrame by one or more columns, and then apply a function to each group. It is useful for calculating aggregate statistics for each group or for creating new columns based on the groupings.

5. 'apply()': This function allows you to apply a function to each element in a column or DataFrame. It is useful for transforming the data, such as converting strings to numbers, or for creating new columns based on the existing columns.

6. 'sort_values()': This function allows you to sort the DataFrame by one or more columns. It is useful for reordering the data to make it easier to analyze or visualize.

7. 'pivot_table()': This function allows you to create a pivot table from the data, which is a summary table that shows the relationship between two or more variables. It is useful for analyzing the relationships between variables and identifying patterns in the data.

### when you might use one of these functions:

+ Suppose you have a DataFrame that contains information about the sales of products in different regions. You want to calculate the total sales for each region and then sort the regions in descending order by their total sales. To do this, you could use the 'groupby()' and 'sum()' functions as follows:



In [5]:
import pandas as pd

# create a DataFrame with some sales data
data = {"region": ["East", "West", "North", "South", "East", "West", "North", "South"],
        "product": ["A", "A", "A", "A", "B", "B", "B", "B"],
        "sales": [100, 200, 150, 250, 300, 400, 350, 450]}
df = pd.DataFrame(data)

# group the data by region and sum the sales for each region
region_sales = df.groupby("region")["sales"].sum()

# sort the regions by their total sales in descending order
region_sales_sorted = region_sales.sort_values(ascending=False)

# print the sorted region sales
print(region_sales_sorted)


region
South    700
West     600
North    500
East     400
Name: sales, dtype: int64


# Q6: Which of the following is mutable in nature Series, DataFrame, Panel?

## In pandas, both Series and DataFrame are mutable in nature, while Panel is deprecated and no longer recommended for use.

+ This means that you can modify the data within a Series or DataFrame after it has been created, either by changing the values of existing elements or by adding or deleting elements. For example, you can add a new column to a DataFrame, or you can change the value of a specific element in a Series. However, it is important to note that modifying a Series or DataFrame can have unintended consequences and can affect downstream calculations, so it should be done with care.


# Q7: Create a DataFrame using multiple Series. Explain with an example.

## To create a DataFrame using multiple Series, we can use the 'pd.DataFrame()' function and pass a dictionary where each key corresponds to a column name and each value corresponds to a Series. Here's an example:


In [6]:
import pandas as pd

# create the first Series
names = pd.Series(["Alice", "Bob", "Charlie", "Dave", "Emily"])

# create the second Series
ages = pd.Series([25, 30, 35, 40, 45])

# create the third Series
genders = pd.Series(["F", "M", "M", "M", "F"])

# create the DataFrame by combining the three Series
df = pd.DataFrame({"Name": names, "Age": ages, "Gender": genders})

# print the DataFrame
print(df)


      Name  Age Gender
0    Alice   25      F
1      Bob   30      M
2  Charlie   35      M
3     Dave   40      M
4    Emily   45      F
