# **NBA Anaysis: Extract, Transform, Load**

## Objectives

* Check Co-pilot generated data for NBA teams, cities, conference and division and with a file that contains the logo URLs and combine into 1 file to be used later in PowerBI

## Inputs

* Files used: 
nba-cities-coordinates-2025.csv
nba-logo-urls-2025.csv


## Outputs

* File produced:
nba-cities-coordinates-logolinks.csv 

## Additional Comments

* Note: Co-Pilot generated correct data that was correct up until 2024. The Los Angeles Clippers moved out of the Staples Center that they shared with the Los Angeles Lakers for over 20 years and moved into their new arena. The co-ordinates were manually edited to reflet this. 
* Note: Co-Pilot was used to generate URLs to logos of each team, however thse did not display when opening links, therefore links were added manually in a spreadsheet and exported to a csv file. 
* Note: The aim of this Notebook is to clean the data and combine to export to 1 csv file with the fields "Team_Name", "City", "State", "Latitude", "Longitude", "Conference", "Division" and "Logo_URL".




---

# Section 1

#Import necessary libraries

In [2]:
import pandas as pd

# Load both CSV files

In [5]:
df1 = pd.read_csv('../data/inputs/raw/nba-cities-coordinates-2025.csv')
df2 = pd.read_csv('../data/inputs/raw/nba-logo-urls-2025.csv')

# Look at data

In [7]:
df1.head()


Unnamed: 0,Team_Name,City,State,Latitude,Longitude,Conference,Division
0,Atlanta Hawks,Atlanta,GA,33.749,-84.388,Eastern,Southeast
1,Boston Celtics,Boston,MA,42.3601,-71.0589,Eastern,Atlantic
2,Brooklyn Nets,Brooklyn,NY,40.6782,-73.9442,Eastern,Atlantic
3,Charlotte Hornets,Charlotte,NC,35.2271,-80.8431,Eastern,Southeast
4,Chicago Bulls,Chicago,IL,41.8781,-87.6298,Eastern,Central


In [8]:
df2.head()

Unnamed: 0,Team_Name,Logo_URL
0,Atlanta Hawks,https://upload.wikimedia.org/wikipedia/en/thum...
1,Boston Celtics,https://upload.wikimedia.org/wikipedia/en/thum...
2,Brooklyn Nets,https://upload.wikimedia.org/wikipedia/en/thum...
3,Charlotte Hornets,https://upload.wikimedia.org/wikipedia/en/thum...
4,Chicago Bulls,https://upload.wikimedia.org/wikipedia/en/thum...


# Look at shape

In [9]:
df1.shape
df2.shape

(30, 2)

# Check data types

In [10]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Team_Name   30 non-null     object 
 1   City        30 non-null     object 
 2   State       30 non-null     object 
 3   Latitude    30 non-null     float64
 4   Longitude   30 non-null     float64
 5   Conference  30 non-null     object 
 6   Division    30 non-null     object 
dtypes: float64(2), object(5)
memory usage: 1.8+ KB


In [11]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Team_Name  30 non-null     object
 1   Logo_URL   30 non-null     object
dtypes: object(2)
memory usage: 612.0+ bytes


# Check null values

In [12]:
df1.isnull().sum()

Team_Name     0
City          0
State         0
Latitude      0
Longitude     0
Conference    0
Division      0
dtype: int64

In [13]:
df2.isnull().sum()

Team_Name    0
Logo_URL     0
dtype: int64

# Check for duplicates

In [14]:
df1.duplicated().sum()

0

In [15]:
df2.duplicated().sum()

0

# Combine them into one DataFrame, using "Team_Name" as the common header in both files
# Note: used Co-pilot to get information on how to do this. It turned out to be similar to SQL, which I had used in the past

In [21]:
merged_df = pd.merge(df1, df2, on='Team_Name', how='inner')

# Check new data frame


In [22]:
merged_df.head()

Unnamed: 0,Team_Name,City,State,Latitude,Longitude,Conference,Division,Logo_URL
0,Atlanta Hawks,Atlanta,GA,33.749,-84.388,Eastern,Southeast,https://upload.wikimedia.org/wikipedia/en/thum...
1,Boston Celtics,Boston,MA,42.3601,-71.0589,Eastern,Atlantic,https://upload.wikimedia.org/wikipedia/en/thum...
2,Brooklyn Nets,Brooklyn,NY,40.6782,-73.9442,Eastern,Atlantic,https://upload.wikimedia.org/wikipedia/en/thum...
3,Charlotte Hornets,Charlotte,NC,35.2271,-80.8431,Eastern,Southeast,https://upload.wikimedia.org/wikipedia/en/thum...
4,Chicago Bulls,Chicago,IL,41.8781,-87.6298,Eastern,Central,https://upload.wikimedia.org/wikipedia/en/thum...


# Save to a new CSV file

In [25]:
merged_df.to_csv('../data/outputs/nba-cities-coordinates-logolinks.csv', index=False)

---

# Section 2

Section 2 content

---

NOTE

* You may add as many sections as you want, as long as it supports your project workflow.
* All notebook's cells should be run top-down (you can't create a dynamic wherein a given point you need to go back to a previous cell to execute some task, like go back to a previous cell and refresh a variable content)

---

# Push files to Repo

* In cases where you don't need to push files to Repo, you may replace this section with "Conclusions and Next Steps" and state your conclusions and next steps.

In [None]:
import os
try:
  # create your folder here
  # os.makedirs(name='')
except Exception as e:
  print(e)
