# Pandas 2 (More Loading and Exploring Data)

## Labubu 2024 Sales Data
<center>
<img src="https://upload.wikimedia.org/wikipedia/en/a/a9/Pop_Mart_Labubu_The_Monsters_Exciting_Macaron.jpg">
</center>
image credit: wikimedia
<br />
<br />

Let's import and explore a small dataset on Popmart's Labubu sales.

## Importing Pandas

First things first, import pandas in the code cell below.

__Syntax__:

```python
import pandas as pd
```

In [2]:
# Import Pandas
import pandas as pd

## Loading a DataFrame: Dictionary

We will be loading this particular dataset from a list of Python Dictionaries.

The list contains sales information related to Popmart's Labubu doll blind-box thing.

__Syntax__:
```python
dataframe = pd.DataFrame(data)
```

In [7]:
## Load Data from a List of Dictionaries

## Data Source: https://kr-asia.com/pop-marts-labubu-breaks-records-in-thailand-can-the-trend-go-global

data = [
    {"Region": "Southeast Asia", "Revenue (USD) (M)": 78.4, "Revenue (CNY) (M)": 560.0},
    {"Region": "East Asia", "Revenue (USD) (M)": 67.0, "Revenue (CNY) (M)": 480.0},
    {"Region": "North America", "Revenue (USD) (M)": 25.2, "Revenue (CNY) (M)": 180.0},
    {"Region": "Europe and Other Markets", "Revenue (USD) (M)": 19.6, "Revenue (CNY) (M)": 140.0}
]

labubu_sales = pd.DataFrame(data) 

## Initial DataFrame Inspection

For each code cell below, let's do some initial Inspection/Exploration of the Data.

* __`df.head()`:__ to see the beginning of your data.
* __`df.describe()`:__ for a statistical summary of numerical columns.
* __`df.info()`:__ to check data types and non-null values.

In [8]:
## View Data Types and Non-Null Values

labubu_sales.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Region             4 non-null      object 
 1   Revenue (USD) (M)  4 non-null      float64
 2   Revenue (CNY) (M)  4 non-null      float64
dtypes: float64(2), object(1)
memory usage: 228.0+ bytes


In [9]:
## View Beginning of Data

labubu_sales.head()


Unnamed: 0,Region,Revenue (USD) (M),Revenue (CNY) (M)
0,Southeast Asia,78.4,560.0
1,East Asia,67.0,480.0
2,North America,25.2,180.0
3,Europe and Other Markets,19.6,140.0


In [10]:
## View Statistical Summary

labubu_sales.describe()


Unnamed: 0,Revenue (USD) (M),Revenue (CNY) (M)
count,4.0,4.0
mean,47.55,340.0
std,29.5,211.029224
min,19.6,140.0
25%,23.8,170.0
50%,46.1,330.0
75%,69.85,500.0
max,78.4,560.0


## Total Sales
<center>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/80/Pop_Mart_store_at_Lize_Paradise_Walk_%2820210911185655%29.jpg/1280px-Pop_Mart_store_at_Lize_Paradise_Walk_%2820210911185655%29.jpg" width=60%>
</center>
image credit: wikimedia
<br/>
<br/>

What is Popmart's total in Labubu Sales for all Regions?

In the cells below, calculate Popmart's total sales in USD (US  Dollar) and CNY (Chinese Yuan)

__Method__

The `.sum()` method is useful for getting the total value of an entire column.

__Syntax__:
```python
total = dataframe[column].sum()
```

In [14]:
## What is the Total Revenue in USD?
total_revenue_usd = labubu_sales["Revenue (USD) (M)"].sum()
print(f"Total Revenue USD: ${total_revenue_usd} Million")

Total Revenue USD: $190.2 Million


In [17]:
## What is the Total Revenue in CNY?
total_revenue_cny = labubu_sales["Revenue (CNY) (M)"].sum()
print(f"Total Revenue CNY: ¥{total_revenue_cny} Million")

Total Revenue CNY: ¥1360.0 Million


## North America's Slice

<center><img src="../images/web/labubu_jordans.jpg" width= 60%></center>
Now that we have the total sales, how much was from North America?

We will have to isolate North America from the rest of the data.
In order to isolate the data, we can create a boolean mask.

We'll go through it step by step in the code cells below.

In [27]:
# North America's Slice of the Pie

# Create a Boolean Mask
north_america = labubu_sales["Region"] == "North America"
north_america

0    False
1    False
2     True
3    False
Name: Region, dtype: bool

In [28]:
# Filter North America from the Data Frame
filtered_df = labubu_sales[north_america]
filtered_df

Unnamed: 0,Region,Revenue (USD) (M),Revenue (CNY) (M)
2,North America,25.2,180.0


In [29]:
# Isolate the Revenue using--you'll need .item()
north_america_usd = filtered_df["Revenue (USD) (M)"].item()
north_america_usd

25.2

In [31]:
# Calculate the North America Sales Percentage
north_america_sales = (north_america_usd / total_revenue_usd) * 100

print(f"North American Revenue was {north_america_sales:.2f}% of Popmart's Total Labubu Revenue in 2024")

North American Revenue was 13.25% of Popmart's Total Labubu Revenue in 2024


## Loading a DataFrame: CSV File

__Syntax__:
```python
dataframe = pd.read_csv(file_path)
```

## Labubu Search Interest

<center><img src="../images/web/labubu_hot_not.avif" width=60%></center>
image credit: The Atlantic
<br/>
<br/>
Let's load some data about Labubu's Search Interest

In [34]:
# Create Data Frame
# Data Source: Google Trends

file = "../data/labubu_interest_by_region.csv"

labubu_interest = pd.read_csv(file)


## Initial DataFrame Inspection

For each code cell below, let's do some initial Inspection/Exploration of the Data.


* __`df.info()`:__ to check data types and non-null values.
* __`df.head()`:__ to see the beginning of your data.
* __`df.tail()`:__ to see teh end of your data.
* __`df.describe()`:__ for a statistical summary of numerical columns.

In [35]:
# Data Types and Information
labubu_interest.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 250 entries, 0 to 249
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Region    250 non-null    object 
 1   Interest  59 non-null     float64
dtypes: float64(1), object(1)
memory usage: 4.0+ KB


In [38]:
# Beginning of Data
labubu_interest.head()



Unnamed: 0,Region,Interest
0,Macao,
1,Hong Kong,100.0
2,Singapore,86.0
3,Guam,
4,Brunei,


In [39]:
# End of Data
labubu_interest.tail()



Unnamed: 0,Region,Interest
245,Antarctica,
246,Pitcairn Islands,
247,Norfolk Island,
248,Niue,
249,Falkland Islands (Islas Malvinas),


In [41]:
# Statistical Summary
labubu_interest.describe()



Unnamed: 0,Interest
count,59.0
mean,22.745763
std,17.043143
min,1.0
25%,13.5
50%,21.0
75%,29.0
max,100.0


## Sort Data: View Top 10 Regions

Where is Labubu the most popular (in terms of Google Search)?

We'll need an additional method to make this happen (`df.sort_values()`)
 
__Syntax__:

```python
dataframe.sort_values(by=column_name, ascending=False).head(10)
```

What additional method will we use to view the Top 10 Regions after they have been sorted?

In [43]:
### Sort by Interest - Top 10

labubu_interest.sort_values(by="Interest", ascending=False).head(10)


Unnamed: 0,Region,Interest
1,Hong Kong,100.0
2,Singapore,86.0
7,Australia,42.0
9,Poland,39.0
10,United Arab Emirates,37.0
11,Slovakia,36.0
12,United States,36.0
14,Malaysia,35.0
16,Czechia,35.0
17,Sweden,32.0


## Sort Data: View Bottom 10 Regions

Can we apply similiar techinques to discover the bottom 10 regions for Labubu search interest?

Give it a try in the code cell below.

In [47]:
## Bottom 10 by Interest

labubu_interest.sort_values(by="Interest", ascending=False).tail(10)



Unnamed: 0,Region,Interest
240,Burundi,
241,French Southern Territories,
242,Palau,
243,Wallis & Futuna,
244,Montserrat,
245,Antarctica,
246,Pitcairn Islands,
247,Norfolk Island,
248,Niue,
249,Falkland Islands (Islas Malvinas),
