# Intro

Based on David Hay's notebook found [here](https://github.com/callysto/basketball-and-data-science/blob/main/notebooks/03-cleaning-and-visualizing.ipynb).
- <david.hay@cybera.ca>
- Peter's [Colab copy](https://colab.research.google.com/drive/1YDCS85e1QLn2yayh9GzhTCdNfTIN1PXP?usp=sharing)

# Python Intro (Cursory)

*Prompt: Tell me why Python is a great language for learning about data analysis.*

Python is a great language for learning about data analysis for many reasons. Here are some of them:

- Python is **easy to learn** and **readable**. It has a simple and English-like syntax that makes it beginner-friendly and intuitive. You don't need to know a lot of technical details or math to start working with Python²⁶.
- Python is **versatile** and **flexible**. You can use it for many different purposes, such as web development, artificial intelligence, internet of things, etc. You can also write scripts, run interactive notebooks, or create applications with Python¹².
- Python has a **huge collection of libraries** that can help you with data analysis. These are pre-written code packages that can handle common tasks, such as data manipulation, visualization, statistics, machine learning, etc. Some of the most popular libraries are Pandas, NumPy, Matplotlib, SciPy, Scikit-learn, and more¹²³⁵.
- Python has a **built-in data analytics tool** called Jupyter Notebook. This is an interactive environment where you can write and run Python code, display graphs and charts, document your findings, and share your work with others³⁵.

These are just some of the reasons why Python is a great choice for data analysis. Have fun! 😊

Source: Conversation with Bing, 2023-05-19

(1) Why Is Python a Great Choice For Data Analysis? - Ideamotive. https://www.ideamotive.co/blog/python-for-data-analysis.

(2) Python for Data Science - GeeksforGeeks. https://www.geeksforgeeks.org/python-for-data-science/.

(3) Why Python is Essential for Data Analysis? - Section. https://www.section.io/engineering-education/why-python-is-essential-for-data-analysis/.

(4) Why You Should Learn Python for Data Science & Analysis - Codecademy. https://www.codecademy.com/resources/blog/*why*-you-should-learn-python-for-data-science/.

(5) Learning Python for Data Analysis - Columbia Engineering Boot Camps. https://bootcamp.cvn.columbia.edu/blog/learning-python-for-data-analysis/.

(6) Why Python Is a Great Choice for Data Analysis - AlixaProDev. https://www.alixaprodev.com/why-python-is-a-great-choice-for-data-analysis/.

## TODO

- give examples of where Python can be run from (online and downloaded)
- create an example program that shows a working Python program that this tutorial will build up to. 

# Libraries

*Prompt: Give some examples of common libraries and what they are used for.*

Python is a versatile and powerful programming language that has many libraries for different purposes. Some common Python libraries are:

- **Requests**¹: This library makes it easy to send HTTP requests and interact with web APIs. You can use it to download files, scrape websites, or access online data sources.
- **NumPy**³: This library provides fast and efficient numerical computing with arrays and matrices. You can use it to perform mathematical operations, linear algebra, statistics, or image processing.
- **Pandas**³: This library offers data analysis and manipulation tools for working with tabular and structured data. You can use it to read, write, filter, sort, group, or reshape data from various formats such as CSV, Excel, JSON, or SQL.
- **Matplotlib**⁴: This library is a 2-D plotting library that allows you to create various types of graphs and charts. You can use it to visualize data, explore trends, or communicate results.

Some Python libraries that might be worth investigating are:

- **PyGame**: This library is a framework for making video games in Python. You can use it to create interactive graphics, sound, animation, or user input for your own games.
- **Turtle**: This library is a module for drawing shapes and patterns using a virtual turtle. You can use it to learn programming concepts, create art, or have fun with geometry.
- **NLTK**: This library is a toolkit for natural language processing (NLP). You can use it to analyze text, extract information, generate sentences, or build chatbots.
- **OpenCV**: This library is a library for computer vision and machine learning. You can use it to process images, detect faces, recognize objects, or apply filters.

Here are some examples of how these libraries might be used:

- PyGame: You can make a simple game like Pong or Snake by using PyGame's sprites, surfaces, events, and sounds. You can also follow tutorials or online courses to learn how to make more complex games like Flappy Bird or Tetris.
- Turtle: You can draw shapes like circles, squares, triangles, or polygons by using Turtle's methods like forward(), backward(), left(), right(), or penup(). You can also use loops or functions to create patterns like spirals, stars, or fractals.
- NLTK: You can analyze text like tweets, reviews, or news articles by using NLTK's modules like tokenize(), tag(), chunk(), or sentiment(). You can also generate text like jokes, stories, or poems by using NLTK's models like ngrams(), markovify(), or gpt2().
- OpenCV: You can process images like selfies, memes, or cartoons by using OpenCV's functions like imread(), imshow(), cvtColor(), or resize(). You can also detect faces by using OpenCV's classifiers like haarcascade_frontalface_default.xml() or lbp_frontalface_improved.xml().

Source: Conversation with Bing, 2023-05-19

(1) 24 Best Python Libraries You Should Check in 2023 - Hackr. https://hackr.io/blog/best-python-libraries.

(2) Most Popular Python Packages in 2021 | LearnPython.com. https://learnpython.com/blog/most-popular-python-packages/.

(3) Top 8 Python Libraries for Data Visualization - GeeksforGeeks. https://www.geeksforgeeks.org/top-8-python-libraries-for-data-visualization/.

(4) The Python Standard Library — Python 3.11.3 documentation. https://docs.python.org/3/library/index.html.


---



This code imports the Pandas library and gives it an alias of pd. Pandas is a 
Python module (library) that is used for data manipulation and analysis. By 
using pd, you can access the functions and methods of Pandas without typing the 
full name every time. For example, you can create a data frame with `pd.DataFrame()` instead of `pandas.DataFrame()`. This makes your code shorter and easier to read.

In [2]:
import pandas as pd

# Let's Get Some Data!

## TODO

Need overview here

This code uses the pd.read_html() function to read the HTML tables from the given URL and store them in a variable called page. The URL is a web page that contains the standings of the NBA teams for the 2022-2023 season. The pd.read_html() function returns a list of data frames, one for each table on the web page. You can access each data frame by its index, such as page[0] for the first table. This way, you can extract and analyze the data from the web page using Pandas.

In [3]:
page = pd.read_html('https://www.basketball-reference.com/leagues/NBA_2023_standings.html')

In [None]:
# Get help on pd.read_html
help(pd.read_html)

# Quick Check of the Data We Imported

In [None]:
print(page)

## Assignment 

Get data from https://en.wikipedia.org/wiki/National_Basketball_Association so we can analyze Arena Capacity.

### TODO

Create notebook stub for them to experiment with

### Solution

In [None]:
import pandas as pd
page = pd.read_html('https://en.wikipedia.org/wiki/National_Basketball_Association')
print(page)

**Question**

What other data might be interesting for teenagers to analyze?

# Python Lists (Intro to Indexing)

Give some examples. Extract one of the tables from the `page` variable.

In [None]:
for table in page:
    display(table)

# Playground

In [5]:
# Import libraries
import requests
import pandas as pd
import matplotlib.pyplot as plt

# Specify URL
url = "https://en.wikipedia.org/wiki/National_Basketball_Association"

# Read table data from HTML using pandas
dfs = pd.read_html(url)

# for n, df in enumerate(dfs):
#     print('Table', n, '\n\n', df)

print(dfs[3])
# # Select the data frame that contains the table with class "wikitable sortable"
# df = dfs[1]

# # Rename the columns
# df.columns = ["Team", "Division", "City", "Arena", "Capacity", "Joined"]

# # Drop the rows with NaN values
# df = df.dropna()

# # Convert capacity column to numeric values
# df["Capacity"] = pd.to_numeric(df["Capacity"].str.replace(",", ""))

# # Plot data frame as a bar chart
# df.plot(x="Team", y="Capacity", kind="bar", figsize=(15, 10), title="NBA Teams Capacity")
# plt.show()


              Division                    Team                    Location  \
    Eastern Conference      Eastern Conference          Eastern Conference   
0             Atlantic          Boston Celtics       Boston, Massachusetts   
1             Atlantic           Brooklyn Nets     New York City, New York   
2             Atlantic         New York Knicks     New York City, New York   
3             Atlantic      Philadelphia 76ers  Philadelphia, Pennsylvania   
4             Atlantic         Toronto Raptors            Toronto, Ontario   
5              Central           Chicago Bulls           Chicago, Illinois   
6              Central     Cleveland Cavaliers             Cleveland, Ohio   
7              Central         Detroit Pistons           Detroit, Michigan   
8              Central          Indiana Pacers       Indianapolis, Indiana   
9              Central         Milwaukee Bucks        Milwaukee, Wisconsin   
10           Southeast           Atlanta Hawks            Atlant