# ***22'Feb Pandas assignment***

# Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

In [5]:
import pandas as pd

data = [4, 8, 15, 16, 23, 42]
series = pd.Series(data)

print(series)


0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64


# Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the variable print it.

In [6]:
import pandas as pd

my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
series = pd.Series(my_list)

print(series)


0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64


# Q3. Create a Pandas DataFrame that contains the following data:

In [8]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
}

df = pd.DataFrame(data)

print(df)


     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


# Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

|                     | DataFrame                                         | pandas.Series                                     |
|---------------------|--------------------------------------------------|--------------------------------------------------|
| **What is it?**     | A DataFrame is like a table with rows and columns. | A pandas.Series is like a single column of data.  |
| **Structure**       | It's two-dimensional, making it suitable for structured data. | It's one-dimensional, ideal for working with a single dimension of data. |
| **Data Types**      | Each column in a DataFrame can have different data types. | A pandas.Series can hold data of any type.       |
| **Common Use Cases**| DataFrames are used for tasks like data manipulation and analysis. | Series are often used when you need to work with a single attribute or column of data. |
| **Flexibility**     | DataFrames provide a flexible way to handle structured data. | Series are valuable for operations on a single data dimension. |


 ***Example***
 
- **Suppose we have data related to student information, including their names, ages, and grades. We can represent this data using both a DataFrame and Series.**

- **Using a DataFrame:**

In [33]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Grade': [85, 92, 78]
}

df = pd.DataFrame(data)
df

# The 'df' DataFrame represents a structured table with rows and columns.


Unnamed: 0,Name,Age,Grade
0,Alice,25,85
1,Bob,30,92
2,Claire,27,78


- **Using a Series**

In [36]:
import pandas as pd

# Create Series for 'names', 'ages', and 'grades'
names = pd.Series(['Alice', 'Bob', 'Claire'], name='Name')
ages = pd.Series([25, 30, 27], name='Age')
grades = pd.Series([85, 92, 78], name='Grade')

# Print the Series
print("Names:")
print(names)
print("\nAges:")
print(ages)
print("\nGrades:")
print(grades)


Names:
0     Alice
1       Bob
2    Claire
Name: Name, dtype: object

Ages:
0    25
1    30
2    27
Name: Age, dtype: int64

Grades:
0    85
1    92
2    78
Name: Grade, dtype: int64


# Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can you give an example of when you might use one of these functions?

| Operation                      | Function/Method                      | Description                                                           | Example Usage                                    |
|---------------------------------|-------------------------------------|-----------------------------------------------------------------------|-------------------------------------------------|
| **Selection and Filtering**    | `df[columns]` or `df.loc[]`         | Select specific columns or rows.                                     | `df[['Name', 'Age']]` or `df.loc[df['Age'] > 25]` |
|                                | `df[df['column'] > value]`          | Filter rows based on a condition.                                    | `df[df['Category'] == 'Electronics']`           |
|                                | `df.query('expression')`            | Filter rows using a query expression.                                | `df.query('Price > 100')`                        |
| **Sorting**                    | `df.sort_values(by='column')`        | Sort the DataFrame by one or more columns.                           | `df.sort_values(by='Date')`                      |
|                                | `df.sort_index()`                   | Sort the DataFrame by its index.                                     | `df.sort_index(ascending=False)`                |
| **Aggregation and Grouping**    | `df.groupby('column').agg(func)`    | Perform aggregation operations on grouped data.                       | `df.groupby('Category').agg({'Price': 'mean'})` |
|                                | `df.pivot_table()`                  | Create a pivot table to summarize data.                               | `df.pivot_table(index='Month', values='Revenue', aggfunc='sum')` |
| **Data Cleaning**              | `df.drop(columns=['column'])`       | Remove unnecessary columns.                                          | `df.drop(columns=['Notes'])`                   |
|                                | `df.fillna(value)`                  | Fill missing values.                                                | `df.fillna(0)`                                 |
|                                | `df.drop_duplicates()`              | Remove duplicate rows.                                              | `df.drop_duplicates(subset='Email')`             |
| **Merging and Joining DataFrames** | `pd.concat([df1, df2])`           | Concatenate DataFrames vertically or horizontally.                 | `pd.concat([df1, df2])`                       |
|                                | `df1.merge(df2, on='key_column', how='inner')` | Merge two DataFrames based on a common key column. | `df1.merge(df2, on='CustomerID', how='inner')`  |
| **Reshaping and Pivoting Data**  | `df.melt()`                        | Reshape data from wide to long format.                               | `df.melt(id_vars=['ID'], value_vars=['Q1', 'Q2', 'Q3'])` |
|                                | `pd.crosstab()`                    | Create a cross-tabulation table.                                    | `pd.crosstab(index=df['Category'], columns=df['Region'])` |
| **Applying Custom Functions**   | `df.apply(func)`                   | Apply a custom function to each row or column.                       | `df['Total'] = df.apply(lambda row: row['Quantity'] * row['Price'], axis=1)` |
|                                | `df.transform(func)`               | Transform data using a custom function.                              | `df['Scaled_Amount'] = df.transform(lambda x: (x - x.mean()) / x.std())` |


# Q6. Which of the following is mutable in nature Series, DataFrame, Panel?

| Data Structure | Mutable (Yes/No) | Description                                    |
|----------------|-------------------|------------------------------------------------|
| Series         | Yes               | A one-dimensional labeled array in Pandas. You can change the values within a Series after it is created. |
| DataFrame      | Yes               | A two-dimensional tabular data structure in Pandas. You can modify the data, add or remove columns, and perform various data manipulation operations on a DataFrame. |
| Panel          | No                | A data structure in Pandas that was used to handle three-dimensional data. However, as of Pandas version 0.25.0, the Panel data structure has been removed from the library and is no longer available. |


# Q7. Create a DataFrame using multiple Series. Explain with an example.

In [37]:
import pandas as pd

# Create multiple Series
names = pd.Series(['Alice', 'Bob', 'Claire'])
ages = pd.Series([25, 30, 27])
grades = pd.Series([85, 92, 78])

# Create a DataFrame using the Series
data = {
    'Name': names,
    'Age': ages,
    'Grade': grades
}

df = pd.DataFrame(data)

# Print the resulting DataFrame
print(df)


     Name  Age  Grade
0   Alice   25     85
1     Bob   30     92
2  Claire   27     78


# <<<<<<<<<<<<<< Complete >>>>>>>>>>>>>>