# ‚öΩ Football Data Analysis - Free Sample Demo

Welcome to the **Football/Soccer Match Database** demo. This notebook demonstrates how to load, clean, and analyze the relational dataset provided in this repository.

### üéØ Objective
We will load match results from the 2015-2016 La Liga season, merge them with team information, and visualize the top-scoring teams at home.

---
**üîó Useful Links:**
* [üìÇ Get the Full Dataset (2000-Present)](TU_LINK_A_PATREON_AQUI) - *Support us and get weekly updates!*
* [üêõ Report an Issue](https://github.com/mzafram2001/football-dataset-fver/issues)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Configuration for clearer charts
plt.style.use('ggplot')
%matplotlib inline

print("‚úÖ Libraries loaded successfully!")

In [None]:
# 1. Define the Source URL (Raw GitHub Links)
# We use the raw version to load data directly from the cloud
base_url = "https://raw.githubusercontent.com/mzafram2001/football-dataset-fver/refs/heads/main/data/"

print("‚è≥ Loading datasets from GitHub...")

try:
    # Load the CSVs
    df_results = pd.read_csv(base_url + "results.csv")
    df_teams = pd.read_csv(base_url + "teams.csv")

    # 2. Data Cleaning (CRITICAL STEP)
    # The raw CSVs often contain leading/trailing whitespace in column names and values.
    # We clean this to ensure smooth merging.
    
    # Clean Column Names
    df_results.columns = df_results.columns.str.strip()
    df_teams.columns = df_teams.columns.str.strip()
    
    # Clean String Values in specific columns
    df_teams['name'] = df_teams['name'].str.strip()
    
    print("‚úÖ Data loaded and cleaned successfully!")
    print(f"   - Matches loaded: {df_results.shape[0]}")
    print(f"   - Teams loaded: {df_teams.shape[0]}")

except Exception as e:
    print(f"‚ùå Error loading data: {e}")

### üîó Relational Merge
This dataset is **normalized**. Instead of repeating the team name in every single match row, we use IDs (`home_id`, `away_id`) that link to the `teams.csv` table.

Let's **merge** the tables to get the human-readable Team Names.

In [None]:
# Merge 'results' with 'teams'
# We want to match: results.home_id == teams.id
df_full = df_results.merge(
    df_teams[['id', 'name', 'code']], # We only need these columns from teams
    left_on='home_id', 
    right_on='id', 
    how='left'
)

# Rename the new column for clarity
df_full.rename(columns={'name': 'home_team_name'}, inplace=True)

# Show the first 5 rows with the new Team Name
df_full[['date', 'home_team_name', 'home_goals_full_time', 'away_goals_full_time', 'result']].head()

### üìä Visualization: Top Scoring Home Teams
Let's analyze which teams were the most dangerous when playing at their home stadium during this season.

In [None]:
# 1. Group by Team Name and Sum the Home Goals
home_goals = df_full.groupby('home_team_name')['home_goals_full_time'].sum().sort_values(ascending=True)

# 2. Plotting
plt.figure(figsize=(10, 8))
home_goals.plot(kind='barh', color='#2ecc71', edgecolor='black')

plt.title('Total Home Goals Scored (La Liga 2015-2016)', fontsize=16)
plt.xlabel('Number of Goals', fontsize=12)
plt.ylabel('Team', fontsize=12)
plt.grid(axis='x', linestyle='--', alpha=0.7)

# Add value labels to the bars
for index, value in enumerate(home_goals):
    plt.text(value + 0.5, index, str(value), va='center')

plt.show()

---
### üöÄ Ready to level up?

This was just a sample of the **2015-2016** season. 

The **Premium Repository** includes:
* üìÖ **Full History:** Data from year 2000 to the current season.
* ü§ñ **Automatic Updates:** New matches added every week via GitHub Actions.
* üåç **More Leagues:** Premier League, Serie A, Bundesliga, and more.
* üìà **Advanced Stats:** Betting odds, corners, shots on target.

üëâ **[Click here to become a Sponsor and get full access](TU_LINK_A_PATREON_AQUI)**