# Pandas 2 (More Loading and Exploring Data)

## Labubu 2024 Sales Data
<center>
<img src="https://upload.wikimedia.org/wikipedia/en/a/a9/Pop_Mart_Labubu_The_Monsters_Exciting_Macaron.jpg">
</center>
image credit: wikimedia
<br />
<br />

Let's import and explore a small dataset on Popmart's Labubu sales.

## Importing Pandas

First things first, import pandas in the code cell below.

__Syntax__:

```python
import pandas as pd
```

In [None]:
# Import Pandas
import pandas as pd

## Loading a DataFrame: Dictionary

We will be loading this particular dataset from a list of Python Dictionaries.

The list contains sales information related to Popmart's Labubu doll blind-box thing.

__Syntax__:
```python
dataframe = pd.DataFrame(data)
```

In [None]:
## Load Data from a List of Dictionaries

## Data Source: https://kr-asia.com/pop-marts-labubu-breaks-records-in-thailand-can-the-trend-go-global

data = [
    {"Region": "Southeast Asia", "Revenue (USD) (M)": 78.4, "Revenue (CNY) (M)": 560.0},
    {"Region": "East Asia", "Revenue (USD) (M)": 67.0, "Revenue (CNY) (M)": 480.0},
    {"Region": "North America", "Revenue (USD) (M)": 25.2, "Revenue (CNY) (M)": 180.0},
    {"Region": "Europe and Other Markets", "Revenue (USD) (M)": 19.6, "Revenue (CNY) (M)": 140.0}
]

labubu_sales_df = pd.DataFrame(data)

## Initial DataFrame Inspection

For each code cell below, let's do some initial Inspection/Exploration of the Data.

* __`df.head()`:__ to see the beginning of your data.
* __`df.describe()`:__ for a statistical summary of numerical columns.
* __`df.info()`:__ to check data types and non-null values.

In [None]:
## View Data Types and Non-Null Values
labubu_sales_df.info()



In [None]:
## View Beginning of Data
labubu_sales_df.head()



In [None]:
## View Statistical Summary
labubu_sales_df.describe()



## Total Sales
<center>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/80/Pop_Mart_store_at_Lize_Paradise_Walk_%2820210911185655%29.jpg/1280px-Pop_Mart_store_at_Lize_Paradise_Walk_%2820210911185655%29.jpg" width=60%>
</center>
image credit: wikimedia
<br/>
<br/>

What is Popmart's total in Labubu Sales for all Regions?

In the cells below, calculate Popmart's total sales in USD (US  Dollar) and CNY (Chinese Yuan)

__Method__

The `.sum()` method is useful for getting the total value of an entire column.

__Syntax__:
```python
total = dataframe[column].sum()
```

In [None]:
## What is the Total Revenue in USD?
revenue_usd = labubu_sales_df["Revenue (USD) (M)"].sum()

print(f"Popmart's total revenue across all Regions, in USD, was ${revenue_usd} (M)")

In [None]:
## What is the Total Revenue in CNY?
revenue_cny = labubu_sales_df["Revenue (CNY) (M)"].sum()

print(f"Popmart's total revenue across all Regions, in CNY, was ¥{revenue_cny} (M)")

## North America's Slice

<center><img src="../images/web/labubu_jordans.jpg" width= 60%></center>
Now that we have the total sales, how much was from North America?

We will have to isolate North America from the rest of the data.
In order to isolate the data, we can create a boolean mask.

We'll go through it step by step in the code cells below.

In [None]:
# North America's Slice of the Pie

# Create a Boolean Mask
mask = labubu_sales_df["Region"] == "North America"

# Output the Mask
mask

In [None]:
# Filter North America from the Data Frame
filtered_labubu_sales_df = labubu_sales_df[mask]

# Output the Filtered DataFrame
filtered_labubu_sales_df


In [None]:
# Isolate the Revenue using--you'll need .item()
north_america_revenue_usd = filtered_labubu_sales_df["Revenue (USD) (M)"].item()

# Output the Isolated Revenue
north_america_revenue_usd

In [None]:
# Calculate the North America Sales Percentage
na_sales_percentage = (north_america_revenue_usd / revenue_usd) * 100

# Output the Results
print(f"North American Sales were {na_sales_percentage:.2f}% of Popmart's total sales in 2024")


## Loading a DataFrame: CSV File

__Syntax__:
```python
dataframe = pd.read_csv(file_path)
```

## Labubu Search Interest

<center><img src="../images/web/labubu_hot_not.avif" width=60%></center>
image credit: The Atlantic
<br/>
<br/>
Let's load some data about Labubu's Search Interest

In [None]:
# Create Data Frame
# Data Source: Google Trends

file = "../data/labubu_interest_by_region.csv"

labubu_interest_df = pd.read_csv(file)


## Initial DataFrame Inspection

For each code cell below, let's do some initial Inspection/Exploration of the Data.


* __`df.info()`:__ to check data types and non-null values.
* __`df.head()`:__ to see the beginning of your data.
* __`df.tail()`:__ to see teh end of your data.
* __`df.describe()`:__ for a statistical summary of numerical columns.

In [None]:
# Data Types and Information
labubu_interest_df.info()

In [None]:
# End of Data
labubu_interest_df.tail()

In [None]:
# End of Data
labubu_interest_df.tail()



In [None]:
# Statistical Summary
labubu_interest_df.describe()

## Sort Data: View Top 10 Regions

Where is Labubu the most popular (in terms of Google Search)?

We'll need an additional method to make this happen (`df.sort_values()`)
 
__Syntax__:

```python
dataframe.sort_values(by=column_name, ascending=False).head(10)
```

What additional method will we use to view the Top 10 Regions after they have been sorted?

In [None]:
### Sort by Interest - Top 10
labubu_interest_df.sort_values(by="Interest",
                               ascending = False).head(10)

## Sort Data: View Bottom 10 Regions

Can we apply similiar techinques to discover the bottom 10 regions for Labubu search interest?

Give it a try in the code cell below.

In [None]:
## Bottom 10 by Interest
labubu_interest_df.sort_values(by="Interest",
                               ascending=False).tail(10)