# Extraction notebook 

The process conducted in this notebook involves reading the *candidates* dataset, transforming it into a Pandas DataFrame for easy manipulation, and then writing the processed data to a MySQL database. 

### Importing libraries and modules

The os and dotenv libraries are used to manage environment variables securely, allowing for the seamless loading of database credentials from a .env file. The sqlalchemy library, including its create_engine and text modules, provides a powerful ORM (Object-Relational Mapping) capability, enabling efficient interaction with the MySQL database. Finally, the pandas library is utilized to handle the candidates dataset, transforming it into a DataFrame for easy manipulation, analysis, and eventually writing it to the MySQL database.

In [1]:
import os
from dotenv import load_dotenv
from sqlalchemy import create_engine, text
import pandas as pd

### Establishing the database connection

In [2]:
# Load environment variables from the .env file
load_dotenv()

# Get the MySQL credentials from environment variables
user = os.getenv('MYSQL_USER')
password = os.getenv('MYSQL_PASSWORD')
host = os.getenv('MYSQL_HOST')
port = os.getenv('MYSQL_PORT')
dbname = os.getenv('MYSQL_DB')

# Create the database URL
db_url = f"mysql+mysqlconnector://{user}:{password}@{host}:{port}/{dbname}"

# Create an engine instance
try:
    engine = create_engine(db_url)
    connection = engine.connect()
    print("Connected to the database successfully")
    connection.close()
except Exception as e:
    print(f"Error: {e}")


Connected to the database successfully


### Reading the dataset and Writing it to the databse

In [5]:
csv_url = "../data/candidates.csv"

try:
    # Read CSV using semicolon (;) as separator
    df = pd.read_csv(csv_url, sep=";")

    # Rename columns to match MySQL table schema
    df.rename(columns={
        "First Name": "first_name",
        "Last Name": "last_name",
        "Email": "email",
        "Application Date": "application_date",
        "Country": "country",
        "YOE": "yoe",
        "Seniority": "seniority",
        "Technology": "technology",
        "Code Challenge Score": "code_challenge_score",
        "Technical Interview Score": "technical_interview"
    }, inplace=True)

    # Convert 'application_date' to DATE format
    df["application_date"] = pd.to_datetime(df["application_date"], dayfirst=True).dt.date

    print("CSV file loaded and formatted successfully:")
    print(df.head())  # Display first few rows for verification

    # Insert data into MySQL table
    df.to_sql(name="candidates", con=engine, if_exists='append', index=False)
    print(f"Data successfully inserted into '{"candidates"}' in database '{dbname}'!")


    print(f"✅ Data successfully inserted into '{"candidates"}' in database '{dbname}'!")
except Exception as e:
    print(f"❌ Error: {e}")


CSV file loaded and formatted successfully:
   first_name   last_name                      email application_date  \
0  Bernadette   Langworth        leonard91@yahoo.com       2021-02-26   
1      Camryn    Reynolds        zelda56@hotmail.com       2021-09-09   
2       Larue      Spinka   okey_schultz41@gmail.com       2020-04-14   
3        Arch      Spinka     elvera_kulas@yahoo.com       2020-10-01   
4       Larue  Altenwerth  minnie.gislason@gmail.com       2020-05-20   

   country  yoe  seniority                         technology  \
0   Norway    2     Intern                      Data Engineer   
1   Panama   10     Intern                      Data Engineer   
2  Belarus    4  Mid-Level                     Client Success   
3  Eritrea   25    Trainee                          QA Manual   
4  Myanmar   13  Mid-Level  Social Media Community Management   

   code_challenge_score  technical_interview  
0                     3                    3  
1                     2         