# MVP Pipeline de Dados
## Pesquisa sobre aparelhos celulares

Edmilson Prata da Silva

PUC-RJ - MBA em Ciência de Dados e Analytics

Disciplina de Engenharia de Dados

## Script para carga da camada BRONZE

## Imports

Imports das bibliotecas necessárias para o funcionamento do script.

In [0]:
import pandas as pd
from pyspark.sql import SparkSession
from pyspark.sql.functions import col

## Carga de Dados

Os dados serão recuperados do GitHub, repositório público. O arquivo foi copiado do Kaggle para o GitHub devdo o acesso ao Kaggle ter apresentado instabilidades durante tentativas de baixar diretamente.

Após carregados, as colunas são renomeadas, conforme o padrão da tabela. Em seguida, os dados são persistidos sem alterações, conforme padrão da camada branze.

In [0]:
df_pandas = pd.read_csv(
    'https://github.com/edprata/pucrj_cellphones/raw/refs/heads/main/kaggle_mobile_dataset_2025.csv',
    sep=',', encoding='latin-1', skip_blank_lines=True, on_bad_lines='skip'
)
df_pandas.shape

Out[12]: (930, 15)

In [0]:
# Converte o DataFrame do Pandas em um DataFrame do Spark
df_spark = spark.createDataFrame(df_pandas)

# Exibe o schema do DataFrame do Spark
df_spark.printSchema()

root
 |-- Company Name: string (nullable = true)
 |-- Model Name: string (nullable = true)
 |-- Mobile Weight: string (nullable = true)
 |-- RAM: string (nullable = true)
 |-- Front Camera: string (nullable = true)
 |-- Back Camera: string (nullable = true)
 |-- Processor: string (nullable = true)
 |-- Battery Capacity: string (nullable = true)
 |-- Screen Size: string (nullable = true)
 |-- Launched Price (Pakistan): string (nullable = true)
 |-- Launched Price (India): string (nullable = true)
 |-- Launched Price (China): string (nullable = true)
 |-- Launched Price (USA): string (nullable = true)
 |-- Launched Price (Dubai): string (nullable = true)
 |-- Launched Year: long (nullable = true)



In [0]:
# Inicializa a sessão do Spark
spark = SparkSession.builder \
    .appName("Carregar CSV do GitHub para Delta") \
    .getOrCreate()

# Renomeia as colunas para corresponder à estrutura da tabela Delta
df_spark = df_spark \
      .withColumnRenamed("Company Name", "company_name") \
      .withColumnRenamed("Model Name", "model_name") \
      .withColumnRenamed("Mobile Weight", "mobile_weight") \
      .withColumnRenamed("RAM", "ram") \
      .withColumnRenamed("Front Camera", "front_camera") \
      .withColumnRenamed("Back Camera", "back_camera") \
      .withColumnRenamed("Processor", "processor") \
      .withColumnRenamed("Battery Capacity", "battery_capacity") \
      .withColumnRenamed("Screen Size", "screen_size") \
      .withColumnRenamed("Launched Price (Pakistan)", "launched_price_pakistan") \
      .withColumnRenamed("Launched Price (India)", "launched_price_india") \
      .withColumnRenamed("Launched Price (China)", "launched_price_china") \
      .withColumnRenamed("Launched Price (USA)", "launched_price_usa") \
      .withColumnRenamed("Launched Price (Dubai)", "launched_price_dubai") \
      .withColumnRenamed("Launched Year", "launched_year")

# Convertendo coluna do ano para String
df_spark = df_spark.withColumn("launched_year", col("launched_year").cast("string"))

# Grava os dados na tabela Delta
df_spark \
  .write \
  .format("delta") \
  .mode("append") \
  .saveAsTable("bronze.mobile_devices")

print("Dados carregados com sucesso na tabela bronze.mobile_devices!")

Dados carregados com sucesso na tabela bronze.mobile_devices!


### Teste de carga da Tabela

Teste de carga da tabela, para garantir o sucesso da operação.

In [0]:
%sql select * from bronze.mobile_devices limit 10

company_name,model_name,mobile_weight,ram,front_camera,back_camera,processor,battery_capacity,screen_size,launched_price_pakistan,launched_price_india,launched_price_china,launched_price_usa,launched_price_dubai,launched_year
Apple,iPhone 16 128GB,174g,6GB,12MP,48MP,A17 Bionic,"3,600mAh",6.1 inches,"PKR 224,999","INR 79,999","CNY 5,799",USD 799,"AED 2,799",2024
Apple,iPhone 16 256GB,174g,6GB,12MP,48MP,A17 Bionic,"3,600mAh",6.1 inches,"PKR 234,999","INR 84,999","CNY 6,099",USD 849,"AED 2,999",2024
Apple,iPhone 16 512GB,174g,6GB,12MP,48MP,A17 Bionic,"3,600mAh",6.1 inches,"PKR 244,999","INR 89,999","CNY 6,499",USD 899,"AED 3,199",2024
Apple,iPhone 16 Plus 128GB,203g,6GB,12MP,48MP,A17 Bionic,"4,200mAh",6.7 inches,"PKR 249,999","INR 89,999","CNY 6,199",USD 899,"AED 3,199",2024
Apple,iPhone 16 Plus 256GB,203g,6GB,12MP,48MP,A17 Bionic,"4,200mAh",6.7 inches,"PKR 259,999","INR 94,999","CNY 6,499",USD 949,"AED 3,399",2024
Apple,iPhone 16 Plus 512GB,203g,6GB,12MP,48MP,A17 Bionic,"4,200mAh",6.7 inches,"PKR 274,999","INR 104,999","CNY 6,999",USD 999,"AED 3,599",2024
Apple,iPhone 16 Pro 128GB,206g,6GB,12MP / 4K,50MP + 12MP,A17 Pro,"4,400mAh",6.1 inches,"PKR 284,999","INR 99,999","CNY 6,999",USD 999,"AED 3,499",2024
Apple,iPhone 16 Pro 256GB,206g,8GB,12MP / 4K,50MP + 12MP,A17 Pro,"4,400mAh",6.1 inches,"PKR 294,999","INR 104,999","CNY 7,099","USD 1,049","AED 3,699",2024
Apple,iPhone 16 Pro 512GB,206g,8GB,12MP / 4K,50MP + 12MP,A17 Pro,"4,400mAh",6.1 inches,"PKR 314,999","INR 114,999","CNY 7,499","USD 1,099","AED 3,899",2024
Apple,iPhone 16 Pro Max 128GB,221g,6GB,12MP / 4K,48MP + 12MP,A17 Pro,"4,500mAh",6.7 inches,"PKR 314,999","INR 109,999","CNY 7,499","USD 1,099","AED 3,799",2024
