<a href="https://colab.research.google.com/github/parsa-abbasi/Data-Preparation-and-Visualization-in-Python/blob/main/numpy_job_offers_solution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NumPy Exercise: Analyzing Job Offers
This notebook utilizes NumPy, a Python library for numerical computing, to perform analyses on a dataset containing information about job offers in the technology industry. 

## 🗄️ Data
The dataset used in this notebook contains information about job offers in the technology industry. It includes the following four arrays:

* `job_titles`: an array of strings that represent different job titles offered in the industry.
* `salaries`: an array of strings that represent the salary ranges for the job offers. Each string contains two numbers separated by a hyphen, representing the minimum and maximum salaries, respectively.
* `companies`: an array of strings that represent the name of the companies offering the job positions.
* `years_of_experience`: an array of integers that represent the required years of experience for the job offers.

In [None]:
import numpy as np

job_titles = np.array(['Data Analyst', 'Machine Learning Engineer',
                       'Data Analyst', 'Data Scientist',
                       'Data Scientist', 'Database Administrator',
                       'Database Administrator', 'Machine Learning Engineer',
                       'Data Scientist', 'Machine Learning Engineer',
                       'Data Analyst', 'Software Engineer',
                       'Back-End Developer', 'Data Analyst'])

salaries = np.array(['$78,500 - $102,000', '$130,000 - $190,000',
                     '$62,000 - $91,000', '$130,000 - $190,000',
                     '$165,000 - $240,000', '$85,000 - $120,000',
                     '$95,000 - $135,000', '$120,000 - $180,000',
                     '$123,000 - $220,000', '$190,000 - $240,000',
                     '$57,500 - $89,000', '$120,000 - $160,000',
                     '$110,000 - $145,000', '$70,000 - $110,000'])

companies = np.array(['Amazon', 'Apple',
                      'Walmart', 'Airbnb',
                      'Netflix', 'Microsoft',
                      'IBM', 'Tesla', 
                      'Microsoft', 'Google',
                      'Capital One', 'Facebook',
                      'Apple', 'JPMorgan Chase'])

years_of_experience = np.array([2, 3,
                                2, 5,
                                5, 5,
                                5, 5,
                                5, 5,
                                2, 3,
                                3, 3])

## 👔 Finding the unique job titles

In [None]:
unique_job_titles = np.unique(job_titles)
print("Unique job titles: ", unique_job_titles)

Unique job titles:  ['Back-End Developer' 'Data Analyst' 'Data Scientist'
 'Database Administrator' 'Machine Learning Engineer' 'Software Engineer']


## 👴🏼 Finding the average years of experience

In [None]:
avg_years_of_experience = np.mean(years_of_experience)
print("Average years of experience: ", avg_years_of_experience)

Average years of experience:  3.7857142857142856


## 💰 Calculating the average salary
As you can see, the average salary is in the format `$XX,XXX - $XX,XXX`. We need to convert this to a single number.   
We can do this by removing the `$` and `,` characters, splitting the string on the `-` character, converting the resulting strings to integers, and then taking the average of the two numbers.

In [None]:
def average_salary(string):
    string = string.replace('$', '').replace(',', '')
    salaries = np.array(string.split('-')).astype(float)
    return np.mean(salaries)

average_salary = np.vectorize(average_salary)
avg_salaries = average_salary(salaries)
avg_salaries

array([ 90250., 160000.,  76500., 160000., 202500., 102500., 115000.,
       150000., 171500., 215000.,  73250., 140000., 127500.,  90000.])

## 💸 Finding the average salary for each job title

In [None]:
# Finding the average salary for each job title
for job_title in unique_job_titles:
    print(job_title, np.mean(avg_salaries[job_titles == job_title]))

Back-End Developer 127500.0
Data Analyst 82500.0
Data Scientist 178000.0
Database Administrator 108750.0
Machine Learning Engineer 175000.0
Software Engineer 140000.0


## 🤑 Finding the highest offer

In [None]:
highest_offer = np.max(avg_salaries)
print("Highest offer average salary: ", highest_offer)

Highest offer average salary:  215000.0


In [None]:
highest_offer_index = np.argmax(avg_salaries)
info = (companies[highest_offer_index], job_titles[highest_offer_index], salaries[highest_offer_index])
print("Highest offer info: ", info)

Highest offer info:  ('Google', 'Machine Learning Engineer', '$190,000 - $240,000')


## 📊 Sort the offers by average salary
We want to print the companies, job titles and salaries in ascending order of average salaries.

In [None]:
sorted_indices = np.argsort(avg_salaries)

for index in sorted_indices:
    print((companies[index], job_titles[index], salaries[index]))

('Capital One', 'Data Analyst', '$57,500 - $89,000')
('Walmart', 'Data Analyst', '$62,000 - $91,000')
('JPMorgan Chase', 'Data Analyst', '$70,000 - $110,000')
('Amazon', 'Data Analyst', '$78,500 - $102,000')
('Microsoft', 'Database Administrator', '$85,000 - $120,000')
('IBM', 'Database Administrator', '$95,000 - $135,000')
('Apple', 'Back-End Developer', '$110,000 - $145,000')
('Facebook', 'Software Engineer', '$120,000 - $160,000')
('Tesla', 'Machine Learning Engineer', '$120,000 - $180,000')
('Apple', 'Machine Learning Engineer', '$130,000 - $190,000')
('Airbnb', 'Data Scientist', '$130,000 - $190,000')
('Microsoft', 'Data Scientist', '$123,000 - $220,000')
('Netflix', 'Data Scientist', '$165,000 - $240,000')
('Google', 'Machine Learning Engineer', '$190,000 - $240,000')


## 👧🏼 Finding all the offers with a required experience of 3 years or less
We want to print the companies, job titles, years of experience and average salaries for all the offers with less than 3 years of experience.

In [None]:
mask = years_of_experience <= 3

for index in np.arange(len(companies))[mask]:
    print((companies[index], job_titles[index], years_of_experience[index], avg_salaries[index]))

('Amazon', 'Data Analyst', 2, 90250.0)
('Apple', 'Machine Learning Engineer', 3, 160000.0)
('Walmart', 'Data Analyst', 2, 76500.0)
('Capital One', 'Data Analyst', 2, 73250.0)
('Facebook', 'Software Engineer', 3, 140000.0)
('Apple', 'Back-End Developer', 3, 127500.0)
('JPMorgan Chase', 'Data Analyst', 3, 90000.0)


## 📦 Concatenate the arrays

Now let's combine all the information into a single array.
Each row will contain the company, job title, years of experience, salary range, and average salary.   
For example, the first row should be like this: `['Amazon' 'Data Analyst' '2' '$78,500 - $102,000' '90250.0']`

In [None]:
job_offers = np.column_stack((companies, job_titles, years_of_experience, salaries, avg_salaries))
print(job_offers)

[['Amazon' 'Data Analyst' '2' '$78,500 - $102,000' '90250.0']
 ['Apple' 'Machine Learning Engineer' '3' '$130,000 - $190,000'
  '160000.0']
 ['Walmart' 'Data Analyst' '2' '$62,000 - $91,000' '76500.0']
 ['Airbnb' 'Data Scientist' '5' '$130,000 - $190,000' '160000.0']
 ['Netflix' 'Data Scientist' '5' '$165,000 - $240,000' '202500.0']
 ['Microsoft' 'Database Administrator' '5' '$85,000 - $120,000'
  '102500.0']
 ['IBM' 'Database Administrator' '5' '$95,000 - $135,000' '115000.0']
 ['Tesla' 'Machine Learning Engineer' '5' '$120,000 - $180,000'
  '150000.0']
 ['Microsoft' 'Data Scientist' '5' '$123,000 - $220,000' '171500.0']
 ['Google' 'Machine Learning Engineer' '5' '$190,000 - $240,000'
  '215000.0']
 ['Capital One' 'Data Analyst' '2' '$57,500 - $89,000' '73250.0']
 ['Facebook' 'Software Engineer' '3' '$120,000 - $160,000' '140000.0']
 ['Apple' 'Back-End Developer' '3' '$110,000 - $145,000' '127500.0']
 ['JPMorgan Chase' 'Data Analyst' '3' '$70,000 - $110,000' '90000.0']]
