
# Candidate Evaluation Exercises for SemiSenior Profile

This notebook contains exercises designed to evaluate a candidate's proficiency in Python programming, PySpark data processing, and AWS Cloud data solutions.



## Python Programming Exercise

**Task:** Write a Python class that represents a simple bank account. The class should have methods to deposit, withdraw, and check the balance, with basic error handling for withdrawal limits.

```python
class BankAccount:
    def __init__(self, initial_balance=0):
        self.balance = initial_balance

    def deposit(self, amount):
        self.balance += amount
        return self.balance

    def withdraw(self, amount):
        if amount > self.balance:
            raise ValueError("Insufficient funds")
        self.balance -= amount
        return self.balance

    def get_balance(self):
        return self.balance

# Example usage
account = BankAccount(100)
account.deposit(50)
print(account.get_balance())  # Output: 150
try:
    account.withdraw(200)
except ValueError as e:
    print(e)  # Output: Insufficient funds
```


Response:


## PySpark Data Processing Exercise

**Task:** Given a PySpark DataFrame `df` with columns `name` and `salary`, write a PySpark query to calculate the average salary and filter out individuals earning more than the average salary.

```python
from pyspark.sql import SparkSession
from pyspark.sql.functions import avg, col

spark = SparkSession.builder.appName("example").getMaster("local").getOrCreate()
data = [("Alice", 50000), ("Bob", 40000), ("Charlie", 70000)]
df = spark.createDataFrame(data, ["name", "salary"])

average_salary = df.agg(avg(col("salary")).alias("average_salary"))
df = df.join(average_salary)
df_filtered = df.filter(col("salary") > col("average_salary"))
df_filtered.show()
```


Response:


## AWS Cloud Data Solutions Exercise

**Task:** Design a cloud-based data pipeline using AWS services that ingests, processes, and visualizes large datasets. The solution should ensure data security, be cost-effective, and scale based on demand.

### Detailed Requirements
1. **Data Ingestion:** Automate the ingestion of large, structured datasets into the cloud using AWS Glue.
2. **Data Storage:** Use Amazon S3 for raw data storage, employing partitioning to improve performance. Processed data can be stored in Amazon Redshift for analysis.
3. **Data Processing:** Utilize AWS Glue for ETL jobs and consider Amazon Kinesis for real-time data processing needs.
4. **Data Visualization:** Implement Amazon QuickSight for dashboard creation and data visualization.
5. **Security and Compliance:** Secure data using AWS KMS, IAM roles, and ensure all data transfers are encrypted using HTTPS.


Response: