
# Candidate Evaluation Exercises for SemiSenior Profile

This notebook contains exercises designed to evaluate a candidate's proficiency in Python programming, PySpark data processing, and AWS Cloud data solutions.



## Python Programming Exercise

**Task:** Write a Python class that represents a simple bank account. The class should have methods to deposit, withdraw, and check the balance, with basic error handling for withdrawal limits.


In [0]:
class StandarBankOperations:
  
  def __init__(self, name:str, bank:str):
    self.name = name
    self.bank = bank
    self.deposit_register = []
    self.withdraw_register = []
    self.total_amount = 0

  def deposit(self, amount:int):
    try:
      if amount:
        self.deposit_register.append(amount)
        self.total_amount += amount
    except ValueError as e:
      raise f'No amount value : {e}'
    

  def withdraw(self, amount:int):
    try:
      if amount and self.total_amount >= amount:
        self.withdraw_register.append(amount)
        self.total_amount -= amount
      else:
        print(f"there's not enough money in the account")
    except ValueError as e:
      raise f'No amount value : {e}'

  def balance(self):

    return   {'name': self.name,
              'bank': self.bank,
              'depositRegister':[x for x in self.deposit_register],
              'withdrawRegister':[x for x in self.withdraw_register],
              'TotalAmount': self.total_amount}
    

In [0]:
class Bancolombia(StandarBankOperations):
    def __init__(self, name:str):
        super().__init__(name, 'Bancolombia')
        self.__withdrawLimits = 300

    def withdraw(self, amount:int):
        try:
            if amount and self.total_amount >= self.__withdrawLimits:
                self.withdraw_register.append(amount)
                self.total_amount -= amount
            else:
                print(f"there's not enough money in the account")
        except ValueError as e:
            raise f'No amount value : {e}'


In [0]:
clientone = Bancolombia("Sergio")
clientone.deposit(2030)
clientone.deposit(2030)
clientone.deposit(2030)
clientone.withdraw(300)
print(clientone.balance())


## PySpark Data Processing Exercise

**Task:** Given a PySpark DataFrame `df` with columns `name` and `salary`, write a PySpark query to calculate the average salary and filter out individuals earning more than the average salary.

Download Dataset from Kaggle

In [0]:
url_kaggle = 'https://www.kaggle.com/datasets/sazidthe1/data-science-salaries'

In [0]:
!pip install kaggle
dbutils.library.restartPython()

In [0]:
!echo '{"username":"sergioquiroga0101","key":"0000000000"}' > ~/.kaggle/kaggle.json
!chmod 600 ~/.kaggle/kaggle.json
!ls ~/.kaggle/

In [0]:
from kaggle.api.kaggle_api_extended import KaggleApi

api = KaggleApi()
api.authenticate()

In [0]:
api.dataset_download_files('sazidthe1/data-science-salaries', path='./', unzip=True)


Create DF

In [0]:
df = spark.read.option('header','true').csv('file:/Workspace/Repos/saquiroga85@misena.edu.co/databricks-exercises/Candidate-Evaluation-Exercises-SemiSenior-Profile/data_science_salaries.csv')
df.printSchema()

> Given a PySpark DataFrame df with columns name and salary, write a PySpark query to calculate the average salary and filter out individuals earning more than the average salary.

In [0]:
from pyspark.sql.functions import *

df_avg = df.agg(avg(col('salary')).alias('avg'))
df_avg_head = df_avg.head()
df_filter = df.filter(col('salary') > df_avg_head[0])
df_filter = df_filter.select('job_title', 'salary').orderBy('salary')
df_filter.show()



## AZURE Cloud Data Solutions Exercise

**Task:** Design a cloud-based data pipeline using AZURE services that ingests, processes, and visualizes large datasets. The solution should ensure data security, be cost-effective, and scale based on demand.

### Detailed Requirements
1. **Data Ingestion:** 
2. **Data Storage:** 
3. **Data Processing:** 
4. **Data Visualization:** 
5. **Security and Compliance:**

### Detailed Requirements
1. **Data Ingestion:** 
Data Factory : Allow me to ingest the Data in automatic way from differente sources and differents types of data, in adition save the data in specific format in the storage account threfore create all the pipeline .
2. **Data Storage:** 
Data Storage: Allow me to create Blob Storage or containers in order to save the data, initiali I need to create a storage account.
3. **Data Processing:** 
Databricks: Allow me to use differents tools like pyspark, python, sql etc. to procces the data, in addition allow me to connect to the blob storages in order to get the data.
4. **Data Visualization:** 
PowerBI: Its native form microsoft and allow me to connect easyly to the data that comes from the DB in order to visualice it.
5. **Security and Compliance:**
KeyValut : To save important credentials and do not expose it in the databricks notebook
IAM: To Allow me to give permission to different users or resources like Databricks