<!-- # Table of Contents
* [Introduction](#Introduction)
* [Data Preparation and Cleaning](#Data-Preparation-and-Cleaning)
  * [Importing the Data](#Importing-the-Data)
  * [Duplicate and Missing Values](#Duplicate-and-Missing-Values)
  * [Observations and Features](#Observations-and-Features)
  * [Outliers](#Outliers)
* [Exploratory Data Analysis](#Exploratory-Data-Analysis)
  * [Distribution of Features](#Distribution-of-Features)
  * [Distribution of Features by Category](#Distribution-of-Features-by-Category)
* [Correlation Analysis](#Correlation-Analysis)
* [](#)
* [Summary](#Summary) -->

\newpage
# Introduction

This report presents an exploratory data analysis of [...] data. The primary goal is [to identify trends, relationships, and interesting angles] within the data that could serve as a foundation for media pitches or blog content, targeting both C-level executives and technical audiences in the IT and data industries. The analysis focuses on understanding [...].

The initial raw data was gained from [...]. The analysed data set has [...] features and [...] observations, ranging from [...] to [...]. For a detailed breakdown of the features, please refer to the @tbl-dictionary. 

| Variable Name | Type    | Description |
| ------------- | ------- | ----------- |
| Var1 | STRING | This is var 1. |
| Var2 | INTEGER | This is var 2. |
| Date | TIMESTAMP | The date of the .|


: The description of variables for data. {#tbl-dictionary}

# Data Preparation and Cleaning

In this section, I examine the dataset to ensure its quality and reliability before analysis. I check for missing or duplicate values, identify potential outliers or typos, and address any inconsistencies. Additionally, I create new derived features where needed to better capture important patterns and support the subsequent analysis.

A preview of the analysed dataset is presented below in @tbl-preview.

In [None]:
#| include: false
import sys
import os

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

In [None]:
#| include: false
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
sys.path.append(project_root)

from src.viz_utils import (
    function_name,
    )

In [None]:
#| include: false
# Setting plotting style
plt.style.use("fast")
plt.rcParams.update({
    "font.family": "Avenir",       
    "text.color": "#130f35",
    "axes.labelcolor": "#130f35",
    "xtick.color": "#130f35",
    "ytick.color": "#130f35",
    "figure.facecolor": "none",
    "axes.facecolor": "none",
    "savefig.transparent": True,
})

oxylab_cmap = LinearSegmentedColormap.from_list(
    "custom_cmap", ["#130f35", "#52A8F8", "#23E6A8"]
)

In [None]:
#| label: tbl-preview
#| tbl-cap: "Raw data pre-view first 5 rows"
df = pd.read_csv("../data/file_name.csv")
print(df.shape)
df.head()

## Duplicate and Missing Values

In this section I will analyse if the data set has any duplicated observations or missing values. From the outputs below we can see that data have [...] missing values and [...] duplicated values.

In [None]:
print("Number of missing values:")
df.isna().sum()
# df.dropna(subset=["var1"], inplace=True)

In [None]:
print("Number of duplicated values:")
df.duplicated().sum()
# df.drop_duplicates(inplace=True)

## Observations and Features

This section begins the detailed exploration of the dataset's structure. I will examine the characteristics of each column to ensure data integrity and understand the available information.

Specifically, for quantitative features (like [...]), I'll inspect the distribution and range of values. For categorical features (like [...]), we'll identify the distinct categories present and count the number of unique observations in each. This step confirms the data types and prepares us for subsequent analysis.

In [None]:
#df["var1"].describe()
#df["var2"].unique()

## Outliers

In this section let's look for typos some obvious outliers or other descrepencies in the data. The [...] column contains [...] values, thus [...]. Additionally, since there are [...], I will explore ways to consolidate or improve this feature.

## Feature Engineering

To gain deeper insights into the data, I'll create new features. The resulting features will facilitate achieving the core goals of this analysis by [...].

# Exploratory Data Analysis

This section focuses on performing a detailed Exploratory Data Analysis. The goal is to visually and statistically summarize the main characteristics of the dataset, identifying initial patterns, and forming hypotheses that will guide the subsequent [...] phases. I'll start by [...].

## Distribution of Features

Understanding the distribution of each feature is the foundation of any analysis. We'll use graphical representations like histograms and density plots to assess t

From this we can see that [...]

> GUIDELINES:
> - Spot the trend
> - Ask Why this happend?
> - Try to explore the dimensions to fige out the reason?
> - Identify patters and sumaries them

## Distribution of Features by Category

Understanding the distribution of how each feature is distributed in each category is important. We'll use graphical representations like histograms and density plots to assess t

From this we can see that [...]

# Key Insights

This section synthesizes the most significant findings and patterns identified during the Exploratory Data Analysis (EDA) and subsequent modeling. We translate the statistical and visual evidence into clear, actionable insights regarding network performance, proxy efficiency, and data retrieval characteristics. These insights serve as the foundation for media narratives and strategic decision-making.

# Summary

This summary provides a concise overview of the entire project, recapping the data source, the scope of the analysis, and the primary conclusions reached regarding the performance and characteristics of the proxy network. It serves as a stand-alone digest for executive-level audiences and colleagues, highlighting the value and business implications of the findings.

# Suggestions for Further Improvements

Based on the patterns observed and the performance metrics analyzed, this section outlines potential next steps for refining the data collection process, enhancing the proxy network infrastructure, or expanding the scope of the analysis. These suggestions are intended to maximize efficiency and further optimize the data scraping solutions.