# Part 3: Data Analytics

This notebook performs the data analysis tasks described in Part 3 of the Rearc Quest.

In [1]:
!pip install python-dotenv pandas boto3




In [2]:
import os
import sys
import boto3
from dotenv import load_dotenv

# Add src directory to path to import common modules
sys.path.append(os.path.join(os.getcwd(), '..', 'src'))

from common.aws import S3Location
from analytics import run_analytics, AnalyticsConfig

# Load environment variables from .env file in the parent directory
load_dotenv(dotenv_path='../.env')


True

## Setup

The S3 locations for our datasets are configured from your `.env` file.

In [3]:
DATA_BUCKET = os.environ.get("DATA_BUCKET")
BLS_PREFIX = os.environ.get("BLS_PREFIX")
POPULATION_TABLE_PREFIX = os.environ.get("POPULATION_TABLE_PREFIX")

# Ensure that the environment variables were loaded
if not DATA_BUCKET:
    raise ValueError("DATA_BUCKET not found in environment variables. Make sure it is set in your .env file.")

config = AnalyticsConfig(
    bls_location=S3Location(bucket=DATA_BUCKET, prefix=BLS_PREFIX),
    population_location=S3Location(bucket=DATA_BUCKET, prefix=POPULATION_TABLE_PREFIX),
)


## Run Analytics

Now, we'll call the `run_analytics` function which will load the data from S3 and perform all the required analyses. 
Make sure your AWS credentials are configured in your environment (e.g., `~/.aws/credentials`).

In [4]:
config.bls_location


S3Location(bucket='rearc-quest-data-nbatara-2025', prefix='bls/')

In [5]:
config.population_location


S3Location(bucket='rearc-quest-data-nbatara-2025', prefix='population/tables/')

In [6]:
try:
    analytics_results = run_analytics(config)
except Exception as e:
    print(f"An error occurred: {e}")
    print("Please ensure the data has been ingested into the S3 bucket and your credentials are set up.")


2025-11-29 15:33:05,566 INFO [botocore.credentials] Found credentials in shared credentials file: ~/.aws/credentials
2025-11-29 15:33:06,168 INFO [botocore.credentials] Found credentials in shared credentials file: ~/.aws/credentials
2025-11-29 15:33:06,536 INFO [analytics] Computed analytic table
2025-11-29 15:33:06,536 INFO [analytics] Computed analytic table
2025-11-29 15:33:06,536 INFO [analytics] Computed analytic table


## Report 1: Population Statistics

Mean and standard deviation of the annual US population across the years [2013, 2018] inclusive.

In [7]:
analytics_results['population_stats']


Unnamed: 0,year_range,mean,std
population,2013-2018,322069808.0,4158441.0


## Report 2: Best Year by Series

For every series_id, the year with the max/largest sum of "value" for all quarters in that year.

In [8]:
analytics_results['best_year']


Unnamed: 0,series_id,year,value
0,PRS30006011,2022,20.500
1,PRS30006012,2022,17.100
2,PRS30006013,1998,705.895
3,PRS30006021,2010,17.700
4,PRS30006022,2010,12.400
...,...,...,...
277,PRS88003192,2002,282.800
278,PRS88003193,2024,860.838
279,PRS88003201,2022,37.200
280,PRS88003202,2022,28.700


## Report 3: Series with Population

The `value` for `series_id = PRS30006032` and `period = Q01` and the `population` for that given year.

In [9]:
analytics_results['series_population']


Unnamed: 0,series_id,year,period,value,Population
0,PRS30006032,1995,Q01,0.0,
1,PRS30006032,1996,Q01,-4.2,
2,PRS30006032,1997,Q01,2.8,
3,PRS30006032,1998,Q01,0.9,
4,PRS30006032,1999,Q01,-4.1,
5,PRS30006032,2000,Q01,0.5,
6,PRS30006032,2001,Q01,-6.3,
7,PRS30006032,2002,Q01,-6.6,
8,PRS30006032,2003,Q01,-5.7,
9,PRS30006032,2004,Q01,2.0,
