**Pandas** is a powerful and versatile Python library designed for data manipulation and analysis. It provides data structures and functions for efficient operations on structured data. Here are the key points about Pandas:

1. **What is Pandas?**
   - **Pandas** simplifies tasks related to data manipulation in Python.
   - It is built on top of the **NumPy** library.
   - Particularly well-suited for working with **tabular data**, such as spreadsheets or SQL tables.
   - Essential for data analysts, scientists, and engineers dealing with structured data.

2. **What Can You Do with Pandas?**
   - Clean, merge, and join datasets.
   - Handle missing data (represented as **NaN**).
   - Insert and delete columns in **DataFrames** (higher-dimensional objects).
   - Perform powerful **group-by** operations.
   - Use Pandas data as input for plotting with **Matplotlib**, statistical analysis in **SciPy**, and machine learning algorithms in **Scikit-learn**.

3. **Pandas Data Structures:**
   - **Series**: A one-dimensional labeled array that can hold data of any type (integer, string, float, Python objects, etc.). Think of it as a column in an Excel sheet.
   - **DataFrame**: A two-dimensional labeled data structure, similar to a table, where rows and columns can be indexed. Built on top of NumPy arrays.

4. **Getting Started with Pandas:**
   - **Installation**: Ensure Pandas is installed using the following command:
     ```
     pip install pandas
     ```
   - **Importing**: Import Pandas in your Python script:
     ```python
     import pandas as pd
     ```

Remember, Pandas makes data manipulation and analysis easier, whether you're cleaning messy data, exploring datasets, or performing complex operations. 🐼📊

For more detailed information, you can refer to the official [Pandas documentation](https://pandas.pydata.org/docs/getting_started/overview.html) ².

Source: Conversation with Bing, 13/03/2024
(1) Package overview — pandas 2.2.1 documentation. https://pandas.pydata.org/docs/getting_started/overview.html.
(2) Pandas Introduction - GeeksforGeeks. https://www.geeksforgeeks.org/introduction-to-pandas-in-python/.
(3) Pandas Introduction - W3Schools. https://www.w3schools.com/python/pandas/pandas_intro.asp.
(4) Introduction to Pandas - Programiz. https://www.programiz.com/python-programming/pandas/introduction.

In [1]:
import pandas as pd

# Read a .csv from a URL with Pandas

Target website: https://www.football-data.co.uk/data.php

In [2]:
# reading 1 csv file from the website
df_premier21 = pd.read_csv('https://www.football-data.co.uk/mmz4281/2122/E0.csv')

In [3]:
# showing dataframe
df_premier21

Unnamed: 0,Div,Date,Time,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,...,AvgC<2.5,AHCh,B365CAHH,B365CAHA,PCAHH,PCAHA,MaxCAHH,MaxCAHA,AvgCAHH,AvgCAHA
0,E0,13/08/2021,20:00,Brentford,Arsenal,2,0,H,1,0,...,1.62,0.50,1.75,2.05,1.81,2.13,2.05,2.17,1.80,2.09
1,E0,14/08/2021,12:30,Man United,Leeds,5,1,H,1,0,...,2.25,-1.00,2.05,1.75,2.17,1.77,2.19,1.93,2.10,1.79
2,E0,14/08/2021,15:00,Burnley,Brighton,1,2,A,1,0,...,1.62,0.25,1.79,2.15,1.81,2.14,1.82,2.19,1.79,2.12
3,E0,14/08/2021,15:00,Chelsea,Crystal Palace,3,0,H,2,0,...,1.94,-1.50,2.05,1.75,2.12,1.81,2.16,1.93,2.06,1.82
4,E0,14/08/2021,15:00,Everton,Southampton,3,1,H,0,1,...,1.67,-0.50,2.05,1.88,2.05,1.88,2.08,1.90,2.03,1.86
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
375,E0,22/05/2022,16:00,Crystal Palace,Man United,1,0,H,1,0,...,2.04,0.25,1.68,2.15,1.74,2.23,1.88,2.25,1.74,2.16
376,E0,22/05/2022,16:00,Leicester,Southampton,4,1,H,0,0,...,2.63,-0.75,1.83,2.07,1.88,2.03,1.94,2.26,1.87,2.01
377,E0,22/05/2022,16:00,Liverpool,Wolves,3,1,H,1,1,...,3.28,-2.50,2.02,1.77,2.06,1.83,2.19,1.99,2.07,1.80
378,E0,22/05/2022,16:00,Man City,Aston Villa,3,2,H,0,1,...,3.36,-2.25,2.06,1.84,2.05,1.86,2.09,2.03,2.01,1.87


In [4]:
# rename columns
df_premier21 = df_premier21.rename(columns={'Date':'SHHHdate',
                                            'HomeTeam':'home_team',
                                            'AwayTeam':'away_team',
                                            'FTHG': 'home_goals',
                                            'FTAG': 'away_goals'})

In [5]:
# show dataframe
df_premier21

Unnamed: 0,Div,SHHHdate,Time,home_team,away_team,home_goals,away_goals,FTR,HTHG,HTAG,...,AvgC<2.5,AHCh,B365CAHH,B365CAHA,PCAHH,PCAHA,MaxCAHH,MaxCAHA,AvgCAHH,AvgCAHA
0,E0,13/08/2021,20:00,Brentford,Arsenal,2,0,H,1,0,...,1.62,0.50,1.75,2.05,1.81,2.13,2.05,2.17,1.80,2.09
1,E0,14/08/2021,12:30,Man United,Leeds,5,1,H,1,0,...,2.25,-1.00,2.05,1.75,2.17,1.77,2.19,1.93,2.10,1.79
2,E0,14/08/2021,15:00,Burnley,Brighton,1,2,A,1,0,...,1.62,0.25,1.79,2.15,1.81,2.14,1.82,2.19,1.79,2.12
3,E0,14/08/2021,15:00,Chelsea,Crystal Palace,3,0,H,2,0,...,1.94,-1.50,2.05,1.75,2.12,1.81,2.16,1.93,2.06,1.82
4,E0,14/08/2021,15:00,Everton,Southampton,3,1,H,0,1,...,1.67,-0.50,2.05,1.88,2.05,1.88,2.08,1.90,2.03,1.86
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
375,E0,22/05/2022,16:00,Crystal Palace,Man United,1,0,H,1,0,...,2.04,0.25,1.68,2.15,1.74,2.23,1.88,2.25,1.74,2.16
376,E0,22/05/2022,16:00,Leicester,Southampton,4,1,H,0,0,...,2.63,-0.75,1.83,2.07,1.88,2.03,1.94,2.26,1.87,2.01
377,E0,22/05/2022,16:00,Liverpool,Wolves,3,1,H,1,1,...,3.28,-2.50,2.02,1.77,2.06,1.83,2.19,1.99,2.07,1.80
378,E0,22/05/2022,16:00,Man City,Aston Villa,3,2,H,0,1,...,3.36,-2.25,2.06,1.84,2.05,1.86,2.09,2.03,2.01,1.87


delete the last row of data


In [6]:
# delete the last row
df_premier21 = df_premier21.drop(379)
df_premier21

Unnamed: 0,Div,SHHHdate,Time,home_team,away_team,home_goals,away_goals,FTR,HTHG,HTAG,...,AvgC<2.5,AHCh,B365CAHH,B365CAHA,PCAHH,PCAHA,MaxCAHH,MaxCAHA,AvgCAHH,AvgCAHA
0,E0,13/08/2021,20:00,Brentford,Arsenal,2,0,H,1,0,...,1.62,0.50,1.75,2.05,1.81,2.13,2.05,2.17,1.80,2.09
1,E0,14/08/2021,12:30,Man United,Leeds,5,1,H,1,0,...,2.25,-1.00,2.05,1.75,2.17,1.77,2.19,1.93,2.10,1.79
2,E0,14/08/2021,15:00,Burnley,Brighton,1,2,A,1,0,...,1.62,0.25,1.79,2.15,1.81,2.14,1.82,2.19,1.79,2.12
3,E0,14/08/2021,15:00,Chelsea,Crystal Palace,3,0,H,2,0,...,1.94,-1.50,2.05,1.75,2.12,1.81,2.16,1.93,2.06,1.82
4,E0,14/08/2021,15:00,Everton,Southampton,3,1,H,0,1,...,1.67,-0.50,2.05,1.88,2.05,1.88,2.08,1.90,2.03,1.86
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
374,E0,22/05/2022,16:00,Chelsea,Watford,2,1,H,1,0,...,2.78,-2.00,1.89,2.01,1.93,1.96,1.96,2.10,1.89,1.98
375,E0,22/05/2022,16:00,Crystal Palace,Man United,1,0,H,1,0,...,2.04,0.25,1.68,2.15,1.74,2.23,1.88,2.25,1.74,2.16
376,E0,22/05/2022,16:00,Leicester,Southampton,4,1,H,0,0,...,2.63,-0.75,1.83,2.07,1.88,2.03,1.94,2.26,1.87,2.01
377,E0,22/05/2022,16:00,Liverpool,Wolves,3,1,H,1,1,...,3.28,-2.50,2.02,1.77,2.06,1.83,2.19,1.99,2.07,1.80


# Read HTML

Target Website: https://en.wikipedia.org/wiki/List_of_The_Simpsons_episodes_(seasons_1%E2%80%9320)

In [7]:
simpsons = pd.read_html('https://en.wikipedia.org/wiki/List_of_The_Simpsons_episodes_(seasons_1%E2%80%9320)')

In [8]:
simpsons[1].head()

Unnamed: 0,No. overall,No. in season,Title,Directed by,Written by,Original air date,Prod. code,U.S. viewers (millions)
0,1,1,"""Simpsons Roasting on an Open Fire""",David Silverman,Mimi Pond,"December 17, 1989",7G08,26.7[47]
1,2,2,"""Bart the Genius""",David Silverman,Jon Vitti,"January 14, 1990",7G02,24.5[47]
2,3,3,"""Homer's Odyssey""",Wes Archer,Jay Kogen & Wallace Wolodarsky,"January 21, 1990",7G03,27.5[48]
3,4,4,"""There's No Disgrace Like Home""",Gregg Vanzo & Kent Butterworth,Al Jean & Mike Reiss,"January 28, 1990",7G04,20.2[49]
4,5,5,"""Bart the General""",David Silverman,John Swartzwelder,"February 4, 1990",7G05,27.1[50]



Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.




Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.




Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.




Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.



In [9]:
# Read the Wikipedia page
url = "https://en.wikipedia.org/wiki/Demon_Slayer:_Kimetsu_no_Yaiba"
tables = pd.read_html(url)


In [13]:
# Check the number of tables
num_tables = len(tables)
print(f"Number of tables found: {num_tables}")

# Inspect the first table (you can adjust the index)
relevant_table = tables[0]
#print(relevant_table.head())  # Display the first few rows
tables[1].head()


Number of tables found: 8


Unnamed: 0,Year,Award,Category,Result,Ref.
0,2017,1st Tsutaya Comic Awards,Anime Hope Division,3rd place,[178]
1,2018,Da Vinci 18th Annual Book of the Year,Book of the Year,30th place,[179]
2,2019,Da Vinci 19th Annual Book of the Year,Book of the Year,10th place,[180]
3,2020,BookWalker Awards,Grand Prize,Won,[181]
4,2020,Piccoma Awards,Luna Category,Won,[182]
