## Creating a Web Data Project with Jupyter Notebooks
******
###### 1. Read in json file(s) resulting from your work with requests/selenium and beautifulsoup

In [1]:
# Import necessary packages
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Read in json files and create dataframes
filename = '2017-04-03.StackOverFlow.json'
filename2 = '2017-04-04.StackOverFlow.json'
data = pd.read_json(filename)
data2 = pd.read_json(filename2)


###### 2. Clean the data column by column
- Ensure that addresses (or other text data) are consistent 
- Eliminate string characters from numeric values 
- Exclude redundant data as appropriate

In [2]:
# Clean up location data to ensure consistency
data['Salary'].replace({'\n': '' }, regex=True, inplace=True)
data2['Salary'].replace({'\n': '' }, regex=True, inplace=True)

In [15]:
# Eliminate any potential duplicates
NoDupsData = data.drop_duplicates()
NoDupsData2 = data2.drop_duplicates()

In [13]:
# Display first dataframe
NoDupsData

Unnamed: 0,Employer,Job Title,Location,Salary,Tags,Time
0,GS&F,Senior Interactive Developer,"Nashville, TN",Not Listed,html5 sass node.js javascript web-stan...,2w ago
1,Synopsys,Managing Consultant - 12615BR,"Nashville, TN",Not Listed,security sdlc,5d ago
2,The Iron Yard,Instructor- Javascript (Fulltime),"Nashville, TN",Not Listed,reactjs javascript node.js,3w ago
3,LunarLincoln,Mobile App Developer,"Nashville, TN",Not Listed,swift android objective-c ios,2w ago
4,LiveSchool Inc.,Backend Engineer at Growing Edtech Company in ...,"Nashville, TN",Not Listed,amazon-web-services aws-lambda amazon-rds-...,2w ago
5,JetSmarter,Web Application Developer,"Fort Lauderdale, FL",$85k - 125k\n,user-experience css html5 javascript sql,< 1h ago
6,Kroll,Senior Software Engineer,"Nashville, TN",Not Listed,javascript angular2 sql-server asp.net c#,1w ago
7,Atlassian,Cloud Architect - JIRA (Java / ReactJS / AWS),"Sydney, Australia",Not Listed,saas java design,1h ago
8,Facebook,Developer Support Engineer,"Dublin, Ireland",Not Listed,java javascript objective-c,2h ago
9,Facebook,"Partner Engineering Manager, Games","London, UK",Not Listed,javascript objective-c php java .net,2h ago


In [14]:
# Display second dataframe
NoDupsData2

Unnamed: 0,Employer,Job Title,Location,Salary,Tags,Time
0,"Genscape, Inc",Senior Software Engineer - Applications and In...,"Louisville, KY",$90k - 120k\n,cloud azure amazon-web-services tibco ...,1h ago
1,Synopsys,Managing Consultant - 12615BR,"Nashville, TN",Not Listed,security sdlc,6d ago
2,The Iron Yard,Instructor- Javascript (Fulltime),"Nashville, TN",Not Listed,reactjs javascript node.js,3w ago
3,ERF Medien e.V.,PHP-Webentwickler (m/w),"Wetzlar, Deutschland",€30k - 40k\n,php mysql phpstorm git cvs,< 1h ago
4,LunarLincoln,Mobile App Developer,"Nashville, TN",Not Listed,swift android objective-c ios,2w ago
5,GS&F,Senior Interactive Developer,"Nashville, TN",Not Listed,html5 sass node.js javascript web-stan...,2w ago
6,Ansira,ColdFusion Developer,No office location,Not Listed,coldfusion,1h ago
7,"Booz Allen Hamilton, Inc","Software Engineer, Senior",,Not Listed,reactjs javascript java,2h ago
8,GLOBO,DevOps Engineer,"Wyncote, PA",Not Listed,devops amazon-web-services mysql redis ...,< 1h ago
9,"Booz Allen Hamilton, Inc",Software Engineer,,Not Listed,sql c# security,3h ago


***Questions of Interest***
 1. What language/skill is in the highest demand among employers? **Will Examine for Project**
 2. Where are most development jobs located? **Will Examine for Project**
 3. What language/skill brings in the highest salary? Might be tough to do since many job offerings do not list a salary.
 4. What companies are most frequently posting jobs?
 5. What jobs have the quickest turnover rate?
 6. What kind of jobs stay posted for the longest period of time?

###### 3. Merge dataframes from separate json files as appropriate
- Find the intersection of two (or more) sets
- Compare the intersection with the newer set to find 'New Products'  
    - When found, add the starting date
- Compare the intersection with the older set to find 'Closed Products'  
    - When found, add the closing date
    - Compare closing date with starting date to find days on market      

In [7]:
# Create the inner merge of the two dataframes
dataInnerMerge = pd.merge(NoDupsData, NoDupsData2, \
                          on=['Employer', 'Job Title'],\
                          how='inner')

In [8]:
# Determine the set of 'new products'
dataInnerMerge

# Create the associated dataframe
 
    
# Add the starting date in the dataframe 
# in which the product is first shown


Unnamed: 0,Employer,Job Title,Location_x,Salary_x,Tags_x,Time_x,Location_y,Salary_y,Tags_y,Time_y
0,GS&F,Senior Interactive Developer,"Nashville, TN",Not Listed,html5 sass node.js javascript web-stan...,2w ago,"Nashville, TN",Not Listed,html5 sass node.js javascript web-stan...,2w ago
1,Synopsys,Managing Consultant - 12615BR,"Nashville, TN",Not Listed,security sdlc,5d ago,"Nashville, TN",Not Listed,security sdlc,6d ago
2,The Iron Yard,Instructor- Javascript (Fulltime),"Nashville, TN",Not Listed,reactjs javascript node.js,3w ago,"Nashville, TN",Not Listed,reactjs javascript node.js,3w ago
3,LunarLincoln,Mobile App Developer,"Nashville, TN",Not Listed,swift android objective-c ios,2w ago,"Nashville, TN",Not Listed,swift android objective-c ios,2w ago
4,LiveSchool Inc.,Backend Engineer at Growing Edtech Company in ...,"Nashville, TN",Not Listed,amazon-web-services aws-lambda amazon-rds-...,2w ago,"Nashville, TN",Not Listed,amazon-web-services aws-lambda amazon-rds-...,3w ago
5,Kroll,Senior Software Engineer,"Nashville, TN",Not Listed,javascript angular2 sql-server asp.net c#,1w ago,"Nashville, TN",Not Listed,javascript angular2 sql-server asp.net c#,1w ago
6,Atlassian,Cloud Architect - JIRA (Java / ReactJS / AWS),"Sydney, Australia",Not Listed,saas java design,1h ago,"Sydney, Australia",Not Listed,saas java design,yesterday
7,Facebook,Developer Support Engineer,"Dublin, Ireland",Not Listed,java javascript objective-c,2h ago,"Dublin, Ireland",Not Listed,java javascript objective-c,yesterday
8,Facebook,"Partner Engineering Manager, Games","London, UK",Not Listed,javascript objective-c php java .net,2h ago,"London, UK",Not Listed,javascript objective-c php java .net,yesterday
9,Integration X A/S,Senior System developer,No office location,Not Listed,java mysql xquery javascript,11h ago,No office location,Not Listed,java mysql xquery javascript,yesterday


In [6]:
# Determine the set of 'closed products'


# Create the associated dataframe


# Add the closing date in the dataframe 
# in which the product is last shown


# Find days on market by comparing the starting date 
# with the closing date

###### 4. Create visualizations
- Histogram of Prices
- Histogram of Days on Market
- Scatter Diagram of Prices vs Days on Market
- Pie Chart of New, Like New, Used

In [7]:
# Create a histogram of Prices


#Temporary Data for Days on Market ... for Scatter Plot Example


# Create a histogram of Days on Market


# Create a Scatter Diagram of Prices vs Days on Market


# Create a Pie Chart of New, Like New, and Used products




###### 5. Trends from your data  
- Search for specific brands and offer counts for each
- Search for product types and offer counts for each
- For each of the above (and other) give counts for day over day or week over week

In [24]:
# List the products representing specific brands of your choosing
javaData = data[data['Tags'].str.contains('java ')]
phpData = data[data['Tags'].str.contains('php ')]

# Create a bar chart of products by brand (for the brands chosen)
javaCount = len(javaData)
phpCount = len(phpData)

objects = ('Java', '', 'PHP', '')
colors = ('green')

y_pos = np.arange(len(objects))
performance = [javaCount, 0, phpData, 0]
fig = plt.figure(figsize=(15,7))
plt.bar(y_pos, performance, align='center',alpha=0.5, color=colors)
plt.xticks(y_pos, objects)
plt.ylabel('Counts')
plt.title('Brands')
 
plt.show()

# List the products of a specific type


# Create a bar chart of products by specific type
# for each date provided



ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In [5]:
# Show resulting dataframes of interest

In [9]:
data

Unnamed: 0,Employer,Job Title,Location,Tags,Time
0,Apex Energy Solutions,Frontend Developer,"Zionsville, IN",Found,4 hours ago
1,Veeva,"Java Engineer, Senior or Principal, CRM","Pleasanton, CA",Found,< 1 hour ago
2,Sonatype,Data Platform Software Engineer,No office location,Found,2 hours ago
3,LunarLincoln,Mobile App Developer,"Nashville, TN",Found,6 days ago
4,GS&F,Senior Interactive Developer,"Nashville, TN",Found,6 days ago
5,Aculocity,Software Engineer - ERP Systems,"Highland Park, IL",Found,< 1 hour ago
6,ProviderTrust,Java Developers at Nashville's Fastest Growing...,"Nashville, TN",Found,1 week ago
7,Elastic,Education Engineer / Technical Trainer - Germa...,No office location,Found,5 hours ago
8,"GX2 Systems, LCC",C++ Developer,"Chicago, IL",Found,2 hours ago
9,The Farmer's Dog,Senior Softwar Engineer,"Brooklyn, NY",Found,1 hour ago
