# Data Analysis with AI

This notebook will guide you through an exploratory data analysis (EDA) of the OECD PISA dataset and the economic dataset. The focus is on analyzing PISA test results and integrating economic data where relevant to gain deeper insights.


### OECD PISA Data

The OECD Programme for International Student Assessment (PISA) dataset contains the results of a worldwide study by the Organisation for Economic Co-operation and Development (OECD) that measures 15-year-old school pupils' scholastic performance on mathematics, science, and reading. The dataset includes the following columns:

- `index`: Index of the row.
- `LOCATION`: Country code.
- `SUBJECT`: Subject (e.g., Mathematics, Reading, Science).
- `GENDER`: Gender of the students (e.g., BOY, GIRL).
- `TIME`: Year of the data.
- `Value`: PISA score.

### Economic and Education Data

The Economic and Education dataset includes various economic indicators related to education, providing insights into the economic context in which the PISA scores were achieved. The dataset includes the following columns:

- `index_code`: Index of the row.
- `expenditure_on_education_pct_gdp`: Government expenditure on education as a percentage of GDP.
- `mortality_rate_infant`: Infant mortality rate.
- `gini_index`: A measure of income inequality.
- `gdp_per_capita_ppp`: Gross Domestic Product per capita, Purchasing Power Parity.
- `inflation_consumer_prices`: Inflation rate based on consumer prices.
- `intentional_homicides`: Rate of intentional homicides.
- `unemployment`: Unemployment rate.
- `gross_fixed_capital_formation`: Gross fixed capital formation as a percentage of GDP.
- `population_density`: Population density.
- `suicide_mortality_rate`: Suicide mortality rate.
- `tax_revenue`: Tax revenue as a percentage of GDP.
- `taxes_on_income_profits_capital`: Taxes on income, profits, and capital gains as a percentage of GDP.
- `alcohol_consumption_per_capita`: Alcohol consumption per capita.
- `government_health_expenditure_pct_gdp`: Government health expenditure as a percentage of GDP.
- `urban_population_pct_total`: Urban population as a percentage of total population.
- `country`: Name of the country.
- `time`: Year of the data.
- `sex`: Gender of the population segment (if applicable).
- `rating`: Additional rating or score related to the dataset (if applicable).

## 1. Purpose and Analysis

The purpose of this notebook is to show how to perform comprehensive data analysis using Gemini in Google Colab, without writing a single line of code. Instead, we will be using natural language prompts to guide the analysis.
### The types of analysis that will be performed include:

In [3]:
# @title
import ipywidgets as widgets
from IPython.display import display

tasks = [
    "1. Loading and Inspecting the Data",
    "2. Basic Descriptive Statistics",
    "3. Handling Missing Values",
    "4. Correlation Analysis",
    "5. Trend Analysis",
    "6. Gender Disparities",
    "7. Subject Performance",
    "8. Economic Indicators and Education Performance",
    "9. Education Expenditure Analysis",
    "10. Income Inequality and Education",
    "11. Policy Impact Analysis",
    "12. Cross-Subject Analysis",
    "13. Regional Analysis",
    "14. Visualization of Key Insights"
]

# Create a checkbox for each task
checkboxes = [widgets.Checkbox(value=False, description=task) for task in tasks]

# Function to handle checkbox state change
def on_checkbox_change(change):
    if change['type'] == 'change' and change['name'] == 'value':
        task = change['owner'].description
        status = "completed" if change['new'] else "not completed"
        print(f"Task '{task}' is {status}")

# Link each checkbox to the handler function
for checkbox in checkboxes:
    checkbox.observe(on_checkbox_change)

# Create a VBox to hold the checkboxes
checklist = widgets.VBox(checkboxes)

# Display the checklist
display(checklist)

VBox(children=(Checkbox(value=False, description='1. Loading and Inspecting the Data'), Checkbox(value=False, …

## 2. Loading the Data
Load the PISA and economic datasets.

*   https://raw.githubusercontent.com/eduhubai/YouTube-Gemini-Colab-OECD-PISA-Data-Analysis/main/OECD_PISA_data.csv
* https://raw.githubusercontent.com/eduhubai/YouTube-Gemini-Colab-OECD-PISA-Data-Analysis/main/economics_and_education_dataset_CSV.csv


## 3. Basic Information
Display basic information about both datasets to understand their structure.

## 4. Summary Statistics
Generate summary statistics for both datasets.

## 5. Missing Values
Check for and handle missing values in both datasets.

## 6. Correlation Analysis
Perform correlation analysis to identify relationships between numerical variables.

## 7. Data Preprocessing / Data Cleaning

### Deal with Pisa dataset first

1. Show the first 5 rows


2. Convert LOCATION, SUBJECT, GENDER, TIME columns to categorical variables



3. Rename TIME as YEAR, LOCATION AS COUNTRY



4. Drop Index column

5. Remove "PISA" in the values in the subject column

### Deal with Economic Dataset

1. show the first 5 rows of ecomocs dataset

2. Rename time as YEAR, country AS COUNTRY, sex as GENDER

## 8. Regional Analysis
Group countries by regions (e.g., Europe, Asia) and analyze regional performance trends.

## 9. Visualization of Key Insights
Create visualizations to illustrate the relationships and trends found in the analysis.