# Take-Home Questions: Operators and Projections

## Instructions 

- Complete all 4 questions using Python and PyMongo.
- Include proper error handling and comments in your code.
- Test your solutions with the provided sample data.
- **Use the `faker` library to generate 10,000 sample employee documents matching the structure below.**
- Submit both your code and sample output.
- **Submit your Jupyter notebook containing all code, generated data, and results.**
- Explain your approach for complex queries.
- **Upload your Jupyter notebook to a public GitHub repository and include the link in your submission.**
- **Submit your GitHub repository link and the notebook file through the provided [Google Form submission link](https://forms.gle/FnwPBBnHhCPUGrk59).**

---

## Sample Employee Document

```python
{
    "_id": ObjectId("..."),
    "employee_id": "EMP001",
    "first_name": "John",
    "last_name": "Doe", 
    "email": "john.doe@company.com",
    "department": "Engineering",
    "position": "Senior Developer",
    "salary": 95000,
    "years_experience": 8,
    "performance_rating": 4.2,
    "skills": ["Python", "JavaScript", "MongoDB", "Docker"],
    "hire_date": ISODate("2020-03-15"),
    "last_promotion": ISODate("2022-06-01"),
    "is_remote": True,
    "address": {
        "city": "San Francisco",
        "state": "CA",
        "zip_code": "94102"
    }
}
```

---

## Tasks

Write Python functions using PyMongo to solve the following:

1. **High Performers Query:** Find all employees with performance rating >= 4.0 **AND** salary > 80,000. Return only their name, department, salary, and performance rating.
2. **Experience-Based Filtering:** Find employees with 5-10 years of experience (inclusive) who earn between $70,000 and $120,000. Project only essential contact information (name, email, department).
3. **Salary Range Analysis:** Find employees whose salary is **NOT** in the range of $60,000-$100,000. Show their full name (concatenated), current salary, and years of experience.
4. **Recent Hires:** Find employees hired in the last 2 years with performance rating > 3.5. Return custom fields showing "full_name", "tenure_months", and "annual_salary".

---

## Expected Deliverables

- Python functions with proper error handling.
- Sample output showing at least 3 results for each query.
- Comments explaining your operator choices.
- **Jupyter notebook file containing all code, generated data, and results.**
- **GitHub repository link containing your notebook.**
- **Submit your GitHub repo link and notebook file through the [Google Form submission link](https://forms.gle/your-form-link-here).**

## Conection to MongoDB and  gerenate 10k Employeers with Faker

In [1]:
from pymongo import MongoClient
from datetime import datetime, timedelta
from bson.objectid import ObjectId
from faker import Faker
import random

# Connexion to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["company_db"]
collection = db["employees"]

# Initializer Faker
fake = Faker()

# List of departments, positions, and skills
departments = ["Engineering", "HR", "Marketing", "Sales", "Finance", "IT"]
positions = ["Junior Developer", "Senior Developer", "Team Lead", "HR Manager", "Marketing Specialist", "Sales Representative", "Financial Analyst", "System Administrator"]
skills = ["Python", "JavaScript", "MongoDB", "Docker", "SQL", "AWS", "Java", "React", "Node.js"]

# Generate 10,000 employees
employees = []
for i in range(10000):
    hire_date = fake.date_time_between(start_date="-5y", end_date="now")
    employee = {
        "_id": ObjectId(),
        "employee_id": f"EMP{i+1:03d}",
        "first_name": fake.first_name(),
        "last_name": fake.last_name(),
        "email": fake.email(),
        "department": random.choice(departments),
        "position": random.choice(positions),
        "salary": random.randint(50000, 150000),
        "years_experience": random.randint(1, 15),
        "performance_rating": round(random.uniform(2.0, 5.0), 1),
        "skills": random.sample(skills, random.randint(2, 5)),
        "hire_date": hire_date,
        "last_promotion": hire_date + timedelta(days=random.randint(0, 730)),
        "is_remote": random.choice([True, False]),
        "address": {
            "city": fake.city(),
            "state": random.choice(["CA", "NY", "TX", "MA"]),
            "zip_code": fake.zipcode()
        }
    }
    employees.append(employee)

# Insert the data
result = collection.insert_many(employees)
print(f"Inserted {len(result.inserted_ids)} employees")

Inserted 10000 employees


## Task 1 : High Performers Query

In [2]:
from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client["company_db"]
collection = db["employees"]

def high_performers_query():
    try:
        query = {
            "$and": [
                {"performance_rating": {"$gte": 4.0}},
                {"salary": {"$gt": 80000}}
            ]
        }
        projection = {"_id": 0, "first_name": 1, "last_name": 1, "department": 1, "salary": 1, "performance_rating": 1}
        result = collection.find(query, projection).limit(3)  # Limiter à 3 résultats pour l'exemple
        for doc in result:
            print(doc)
    except Exception as e:
        print(f"Error: {e}")

high_performers_query()

{'first_name': 'Troy', 'last_name': 'Holmes', 'department': 'HR', 'salary': 130687, 'performance_rating': 4.5}
{'first_name': 'Kristen', 'last_name': 'Sanchez', 'department': 'Finance', 'salary': 88870, 'performance_rating': 4.0}
{'first_name': 'Aaron', 'last_name': 'Ray', 'department': 'Engineering', 'salary': 124024, 'performance_rating': 4.8}


## Task 2 : Experience-Based Filtering

In [3]:
# Connexion à MongoDB
from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client["company_db"]
collection = db["employees"]

def experience_based_filtering():
    try:
        query = {
            "$and": [
                {"years_experience": {"$gte": 5, "$lte": 10}},
                {"salary": {"$gte": 70000, "$lte": 120000}}
            ]
        }
        projection = {"_id": 0, "first_name": 1, "last_name": 1, "email": 1, "department": 1}
        result = collection.find(query, projection).limit(3)
        for doc in result:
            print(doc)
    except Exception as e:
        print(f"Error: {e}")

experience_based_filtering()

{'first_name': 'Micheal', 'last_name': 'Smith', 'email': 'johnnysmith@example.org', 'department': 'IT'}
{'first_name': 'Kristen', 'last_name': 'Sanchez', 'email': 'robertglenn@example.net', 'department': 'Finance'}
{'first_name': 'Melissa', 'last_name': 'Sanders', 'email': 'paulaconrad@example.org', 'department': 'IT'}


## Task 3 : Salary Range Analasis

In [4]:
# Connexion à MongoDB
from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client["company_db"]
collection = db["employees"]

def salary_range_analysis():
    try:
        query = {
            "salary": {"$not": {"$gte": 60000, "$lte": 100000}}
        }
        # Utiliser $concat pour créer full_name dans une agrégation
        pipeline = [
            {"$match": query},
            {"$project": {
                "_id": 0,
                "full_name": {"$concat": ["$first_name", " ", "$last_name"]},
                "salary": 1,
                "years_experience": 1
            }},
            {"$limit": 3}
        ]
        result = collection.aggregate(pipeline)
        for doc in result:
            print(doc)
    except Exception as e:
        print(f"Error: {e}")

salary_range_analysis()

{'salary': 130687, 'years_experience': 9, 'full_name': 'Troy Holmes'}
{'salary': 147200, 'years_experience': 12, 'full_name': 'Amber Collins'}
{'salary': 121885, 'years_experience': 2, 'full_name': 'Nicole Doyle'}


## Task 4 : Recent Hires

In [5]:
from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client["company_db"]
collection = db["employees"]

from dateutil.relativedelta import relativedelta

def recent_hires():
    try:
        two_years_ago = datetime.now() - relativedelta(years=2)
        query = {
            "$and": [
                {"hire_date": {"$gte": two_years_ago}},
                {"performance_rating": {"$gt": 3.5}}
            ]
        }
        pipeline = [
            {"$match": query},
            {"$project": {
                "_id": 0,
                "full_name": {"$concat": ["$first_name", " ", "$last_name"]},
                "tenure_months": {
                    "$divide": [
                        {"$subtract": [datetime.now(), "$hire_date"]},
                        1000 * 60 * 60 * 24 * 30  # Convertir millisecondes en mois
                    ]
                },
                "annual_salary": "$salary"
            }},
            {"$limit": 3}
        ]
        result = collection.aggregate(pipeline)
        for doc in result:
            print(doc)
    except Exception as e:
        print(f"Error: {e}")

recent_hires()

{'full_name': 'Troy Holmes', 'tenure_months': 6.647415524691358, 'annual_salary': 130687}
{'full_name': 'Kristen Sanchez', 'tenure_months': 12.689292067901235, 'annual_salary': 88870}
{'full_name': 'Aaron Ray', 'tenure_months': 3.361051327160494, 'annual_salary': 124024}
