# Pandas basic

#### 1) Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

In [2]:
import pandas as pd

df = pd.Series([4, 8, 15, 16, 23, 42])
print(df)

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64


#### 3) Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the variable print it.

In [8]:
my_list = list(range(1,11))  # Create a list with 10 elements (1-10)

series = pd.Series(my_list)
print(series)

0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64


#### 3) Create a Pandas DataFrame that contains the following data, Then Print the DataFrame.

In [9]:
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
})

print(df)

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


#### 4) What is "DataFrame" in pandas and how is it different from pandas.series? Explain with an example.

DataFrame vs. Series

* A Pandas Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.).  It's essentially a single column of data.

* A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types.  Think of it as a table, similar to a spreadsheet or SQL table, or a dictionary of Series objects.  Each column in a DataFrame is a Series.


Key Differences:
* Dimensionality: Series is 1D, DataFrame is 2D.
* Structure: Series is like a single column, DataFrame is like a table with multiple columns.
* Columns: A DataFrame can have multiple columns (each a Series), while a Series has only one.
* Flexibility:  Both are flexible in data types, but DataFrames are better suited for representing tabular data with diverse data types in different columns.


In [11]:
# Series Example

series_data = pd.Series([4, 8, 15, 16, 23, 42])
print("Series:\n", series_data)


# DataFrame Example

data = {'Name': ['Alice', 'Bob', 'Claire'],
        'Age': [25, 30, 27],
        'Gender': ['Female', 'Male', 'Female']}

df_data = pd.DataFrame(data)
print("\nDataFrame:\n", df_data)

Series:
 0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64

DataFrame:
      Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


#### 5) What are some common functions you can use to manipulate data in a Pandas DataFrame? Can you give an example of when you might use one of these functions?

In [12]:
# 1. head() and tail(): Display the first or last few rows of the DataFrame. Useful for quickly inspecting the data.
print(df.head(2))
print(df.tail(1))

# 2. describe(): Generate descriptive statistics (count, mean, std, min, max, etc.) for numerical columns.
print(df.describe())

# 3. info(): Display a concise summary of the DataFrame, including data types and non-null values. Helpful for understanding data structure and potential missing values.
print(df.info())

# 4. sort_values(): Sort the DataFrame by one or more columns.
print(df.sort_values(by='Age', ascending=False))

# Example of using groupby()
# Assume we have data on sales transactions.
data = {'Product': ['A', 'A', 'B', 'B', 'C'], 'Sales': [100, 150, 200, 250, 120]}
sales_df = pd.DataFrame(data)

# Group by product and sum sales for each product
product_sales = sales_df.groupby('Product')['Sales'].sum()
product_sales

    Name  Age  Gender
0  Alice   25  Female
1    Bob   30    Male
     Name  Age  Gender
2  Claire   27  Female
             Age
count   3.000000
mean   27.333333
std     2.516611
min    25.000000
25%    26.000000
50%    27.000000
75%    28.500000
max    30.000000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    3 non-null      object
 1   Age     3 non-null      int64 
 2   Gender  3 non-null      object
dtypes: int64(1), object(2)
memory usage: 204.0+ bytes
None
     Name  Age  Gender
1     Bob   30    Male
2  Claire   27  Female
0   Alice   25  Female


Unnamed: 0_level_0,Sales
Product,Unnamed: 1_level_1
A,250
B,450
C,120


#### 6) Which of the following is mutable in nature Series, DataFrame, Panel?

Both Series and DataFrame are mutable in pandas.  Panel, while technically part of older pandas versions, is deprecated and no longer recommended for use.  Therefore, the practical answer is **Series and DataFrame**.

#### 7) Create a DataFrame using multiple Series. Explain with an example.

In [13]:
# Create individual Series
names = pd.Series(['Alice', 'Bob', 'Charlie', 'David'])
ages = pd.Series([25, 30, 28, 22])
cities = pd.Series(['New York', 'London', 'Paris', 'Tokyo'])

# Combine the Series into a DataFrame
data = {'Name': names, 'Age': ages, 'City': cities}
df = pd.DataFrame(data)

# Print the DataFrame
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,London
2,Charlie,28,Paris
3,David,22,Tokyo
