## How to Interact with this Jupyter Notebook

In this activity, you will use a Jupyter Notebook, which integrates both text and code. The gray boxes contain executable code, which you will run in order to view its output. The text in between the code provides instructions.

## Scenario: Charting the Customer Journey with Pandas

Imagine you're a Python developer at a rapidly growing e-commerce company. The marketing team is eager to understand customer behavior and preferences to tailor their campaigns and improve the overall shopping experience. They've provided you with a valuable dataset containing information about customers, their purchases, and demographics. 

Your task is to leverage your Python skills and the power of the Pandas library to load this dataset, explore its structure, and uncover preliminary insights that will guide further analysis. This initial exploration is crucial for understanding the data you're working with and making informed decisions about how to proceed with more in-depth analysis and visualization.

In the cell below, begin by importing the `pandas` library with the alias `pd`. Then, use `.read_csv()` to load the `customer_data_50.csv` file into a DataFrame named `customer_data`. 

Lastly, run the cell.

In [15]:
# Import the pandas library with the alias 'pd'

# insert code here 
import 

# Load the CSV file 'customer_data_50.csv' into a DataFrame

# insert code here 
customer_data = pd.('')

Run the following cell, which will check the dimensions of your DataFrame using the `.shape` attribute. This tells you how many rows and columns your data has – kind of like figuring out the size of a spreadsheet!

In [16]:
# Display the shape of the DataFrame (rows, columns)
print("\nShape of the DataFrame (rows, columns):", customer_data.shape)


Shape of the DataFrame (rows, columns): (50, 13)


Next, you'll inspect the data using the `df.head()` function, which allows you to view the first few rows of the DataFrame. This gives you a quick look at the data's structure and content.

In the cell below, use `df.head()`to display the first 5 rows of the `customer_data` DataFrame.  Then, run the cell and take a moment to observe the output. 

In [20]:
# Display the first 5 rows
print("First 5 rows:\n")

# insert code here 
print(customer_data.head()) #delete this adeeb

First 5 rows:

   customer_id first_name  last_name                       email gender  age  \
0         1001     Sophia      Smith    sophia.smith@example.com      M   54   
1         1002     Joseph      Smith    joseph.smith@example.com      M   66   
2         1003       John   Anderson   john.anderson@example.com      F   56   
3         1004       Emma  Hernandez  emma.hernandez@example.com      M   44   
4         1005      Emily     Garcia    emily.garcia@example.com      F   25   

          city state country  purchase_count  total_spend  avg_order_value  \
0  San Antonio    TX     USA               5          965            193.0   
1  Los Angeles    CA     USA               7         1246            178.0   
2      Phoenix    AZ     USA               1          199            199.0   
3  Los Angeles    CA     USA              14         3752            268.0   
4       Dallas    TX     USA              12         1620            135.0   

           last_purchase_date  
0  

Now, you'll use the `df.info()` function, which provides a concise summary of the DataFrame, including the column names, their data types, and the number of non-null values.

In the cell below, use `df.info()` to print information about the `customer_data` DataFrame  Then, run the cell and take a moment to observe the output. 

In [18]:
# Print the column names and their data types
print("\nColumn names and their data types:\n")

# insert code here 
print(customer_data.) 


Column names and their data types:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 13 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   customer_id         50 non-null     int64  
 1   first_name          50 non-null     object 
 2   last_name           50 non-null     object 
 3   email               50 non-null     object 
 4   gender              50 non-null     object 
 5   age                 50 non-null     int64  
 6   city                50 non-null     object 
 7   state               50 non-null     object 
 8   country             50 non-null     object 
 9   purchase_count      50 non-null     int64  
 10  total_spend         50 non-null     int64  
 11  avg_order_value     50 non-null     float64
 12  last_purchase_date  50 non-null     object 
dtypes: float64(1), int64(4), object(8)
memory usage: 5.2+ KB
None


Next, you'll use the `df.describe()` function, which generates descriptive statistics for the numerical columns in the DataFrame.

In the cell below, use `df.describe()` to display summary statistics for the numerical columns in the `customer_data` DataFrame.

In [19]:
# Display descriptive statistics for numerical columns
print("\nDescriptive statistics for numerical columns:\n")

# insert code here 
print()


Descriptive statistics for numerical columns:

       customer_id        age  purchase_count  total_spend  avg_order_value
count     50.00000  50.000000        50.00000    50.000000        50.000000
mean    1025.50000  43.440000         8.60000  1491.880000       179.920000
std       14.57738  14.833993         4.28095   968.697666        70.820221
min     1001.00000  19.000000         1.00000   199.000000        53.000000
25%     1013.25000  30.000000         5.00000   819.000000       125.750000
50%     1025.50000  44.500000         8.00000  1350.000000       180.000000
75%     1037.75000  54.000000        12.00000  1916.000000       237.500000
max     1050.00000  69.000000        15.00000  4440.000000       299.000000


Finally, in the code cell below, you'll use the `.mean()` and `.median()` functions on the `'age'` column of your `customer_data` to calculate the average and median age of all your customers. 

The square brackets [] are used for column selection in Pandas. Within the brackets, you specify the name of the column you want to extract, which in this case is 'age'

Run the cell to see the average and median age of your customers.

In [21]:
# Calculate the mean of the 'age' column
mean_age = customer_data['age'] # insert code here 

# Print the mean age
print("\nMean Age:", mean_age)

# Calculate the median of the 'age' column
median_age = customer_data['age'] # insert code here 

# Print the median age
print("\nMedian Age:", median_age)


Mean Age: 43.44

Median Age: 44.5


## Activity Recap: Charting the Customer Journey with Pandas

Congratulations! In this activity, you learned how to load a CSV file into a Pandas DataFrame and use various functions to inspect its structure and contents:

* `pd.read_csv()` is used to load CSV data into a DataFrame.
* `df.head()` shows the first few rows.
* `df.info()` provides a summary of the DataFrame's structure.
* `df.describe()` generates descriptive statistics for numerical columns.