# ETL Project: Landing the Google Interview

   #### Exploring preferred qualifications for jobs at Google











## General Overview 

Everyone knows that Google has a reputation for hard interviews. However, with high pay, and excellent perks, people are lining up at the door to try and get a job there. Because of supply and demand, Google has the ability to be extremeley picky. So, if you want to work at Google, what are the most important skills that you should work on in order to achieve your goal? Let's take a look. 

## Objective 

The goal of my project is to analyze the job postings and preferred qualifications to get a picture of what the ideal candidate looks like for a job at Google, ie. what qualifications are mentioned the most for certain job categories. So if you have your sights set on a job at Google, my analysis will show the most commonly seen skill sets to help guide the next steps of your skill aquisition process. (I should probably take my own advice here.) 

## Methodology 

I found a .CSV data set using Kaggle.com. Link: (https://www.kaggle.com/niyamatalmass/google-job-skills). The data set itself was a web-scrape of Google's job postings taken from '16 -'17. The user who posted it used Selenium to scrape the data. 

# Steps

### Data Frame

To perform the analysis, I loaded the .CSV into a Pandas data frame. The raw data looked like this: 

![Screen%20Shot%202019-03-29%20at%208.28.53%20PM.png](attachment:Screen%20Shot%202019-03-29%20at%208.28.53%20PM.png)

### Data Cleaning & Transforming 

Next, I cleaned the data so that I could run my text analysis on the resulting dataframe. I used a function to strip out punctuation, EOL characters, & "Stop Words". I found a list of common stop words on Stackoverflow to create an easy list of which words to strip out. It otherwise would have been quite a manual process. 

After removing the words I don't want in the analysis, I integrated my new columns in my data frame. I got the bare-bone nouns and verbs I was looking for to run my text analysis here. This doesn't look like a big change, but I managed to strip out a lot of unnecessary words and transform each cell into a list of strings. 

![Screen%20Shot%202019-03-29%20at%208.34.44%20PM.png](attachment:Screen%20Shot%202019-03-29%20at%208.34.44%20PM.png)

### SQL Database

After cleaning the data, it's time to upload to an SQL Database. Using the Postgres Admin client, I uploaded my new dataframe into an SQL Databse, which is shown here. 

![Screen%20Shot%202019-03-29%20at%2010.09.42%20PM.png](attachment:Screen%20Shot%202019-03-29%20at%2010.09.42%20PM.png)

### Connecting to SQL Database & Building a New Dataframe for Analysis 

I wanted to run my analysis on strictly the preferred qualifications for "Software Engineering" at Google. So I set up my query from the SQL Databse to pull my data directly into Jupyter Notebook. Here's my database Query: 

group = engine.execute("Select preferred_qualifications from skilldetails WHERE category='Software Engineering'")

After pulling my data, I built a new data frame. I looped through the data by Job Category, "Software Engineering," and did unique word count of every individual word in my database, putting both the words and their corresponding unique counts into 2 lists. I used the 2 lists to create a new dataframe from which I would drive my analysis. 

![PrefSkillsWordCount.png](attachment:PrefSkillsWordCount.png)

What I noticed here is that I still had a lot of redundant or otherwise irrelevant words in the database, such as, "Experience," "related," and "nan." I chose to manually remove all rows that contained such words. I also noticed that because I sorted my dataframe in an ascending order, the index was out of order. I decided to reset the index as well.

This resulted in the following dataframe: 

![Reset_Index_Preferred_Skills.png](attachment:Reset_Index_Preferred_Skills.png)

### Analysis 

After building my new dataframe, here's a bar chart of the top "40" most common words from the Preferred Qualifications of all Software Engineering roles at Google. 

![Top%2040%20Preferred%20Skills%20Clean.png](attachment:Top%2040%20Preferred%20Skills%20Clean.png)

Of all the job postings on Google's job board, here is the breakdown. At the time of this web scraping, there were only 31 Software Engineering Jobs available at Google. 

![Google%20Jobs%20by%20Category.png](attachment:Google%20Jobs%20by%20Category.png)

It's important to take the number of jobs available into account and compare that to the graph above. Of the 31 jobs available for Software Engineering, these are the Preferred Qualifications that stuck out to me and would help make the ideal candidate: 

Experience - 14 times (Duh) 

Technical - 13 times

PhD - 11 times

Computer Science - 11 times 

Design - 9 times 

Python - 7 times

Development - 7 times

Systems - 6 times

Code - 6 times

Coding - 4 times

Languages - 5 times (implying multiple languages) 

C - 5 times

JavaScript - 5 times

Programming - 5 times

GO - 4 times

CC - 4 times

MS - 4 times (Masters of Science)

Masters - 4 times 

etc. 

### Conclusion

If you want to work at Google as a Software Engineer, it might be time to back to school and get your PhD in Comp Sci. If you don't want to spend time doing that, maybe a Master's Degree in Computer Science will help add some weight to your resume. While you're at it, it would be a good idea to focus on learning multiple languages such as Python, JavaScript, Go, and C.
