Skip to content

Commit

Permalink
better visual: set y axis breaks and percentage formatting
Browse files Browse the repository at this point in the history
  • Loading branch information
steve committed Mar 11, 2017
1 parent e7a75e2 commit cb7e713
Show file tree
Hide file tree
Showing 4 changed files with 12 additions and 9 deletions.
5 changes: 3 additions & 2 deletions DSJobSkill.R
Expand Up @@ -101,5 +101,6 @@ print(arrange(results$running, -count))
results$running$count<-results$running$count/results$num_jobs
jt <- str_replace_all(job_title, '\\+|\\\"', ' ')
loc <- str_replace_all(location, '\\%2C+|\\+',' ')
ggplot(results$running, aes(reorder(skill,-count), count)) + geom_bar(stat="identity") +
labs(x = 'Skill', y = 'Count', title = paste0('Skill occurrences(%) for ', jt, ' in ', loc))
p <- ggplot(results$running, aes(reorder(skill,-count), count)) + geom_bar(stat="identity") +
labs(x = 'Skill', y = 'Occurrences (%)', title = paste0('Skill occurrences(%) for ', jt, ' in ', loc))
p + scale_y_continuous(labels = scales::percent, breaks = seq(0,1,0.1))
5 changes: 3 additions & 2 deletions DSJobSkill.Rmd
Expand Up @@ -199,8 +199,9 @@ jt <- str_replace_all(job_title, '\\+|\\\"', ' ')
loc <- str_replace_all(location, '\\%2C+|\\+',' ')
# Visualization
ggplot(results$running, aes(reorder(skill,-count), count)) + geom_bar(stat="identity") +
labs(x = 'Skill', y = 'Count', title = paste0('Skill occurrences(%) for ', jt, ' in ', loc))
p <- ggplot(results$running, aes(reorder(skill,-count), count)) + geom_bar(stat="identity") +
labs(x = 'Skill', y = 'Occurrences (%)', title = paste0('Skill occurrences(%) for ', jt, ' in ', loc))
p + scale_y_continuous(labels = scales::percent, breaks = seq(0,1,0.1))
```

## Takeaway
Expand Down
11 changes: 6 additions & 5 deletions README.md
Expand Up @@ -90,7 +90,7 @@ cat('Total job count: ', job_count)
```

```
## Total job count: 87
## Total job count: 88
```

Scraping the job links and page link requires deeper knowledge in html. I spent quite some time to extract those two parts out. Jobs are under html nodes:_h2_ _a_, links for search result pages are more complex, I had to use XPath to find them out. It's almost a must knowing the basic of html/css. Good lesson for me. Hadley actually pointed out a useful tool [SelectorGadget](http://selectorgadget.com/) but I didn't find it to be effective on Indeed's website. Indeed's html appears to be unstructured. Not sure if they do that on purpose to prevent scarping or not. Anyhow, the code is much simpler than the process to reach them properly.
Expand Down Expand Up @@ -202,11 +202,11 @@ print(arrange(results$running, -count))

```
## skill count
## 1 R 52
## 1 R 53
## 2 Python 47
## 3 SQL 36
## 4 Java 32
## 5 SAS 22
## 5 SAS 23
## 6 Excel 22
## 7 Hadoop 21
## 8 Spark 17
Expand All @@ -228,8 +228,9 @@ jt <- str_replace_all(job_title, '\\+|\\\"', ' ')
loc <- str_replace_all(location, '\\%2C+|\\+',' ')

# Visualization
ggplot(results$running, aes(reorder(skill,-count), count)) + geom_bar(stat="identity") +
labs(x = 'Skill', y = 'Count', title = paste0('Skill occurrences(%) for ', jt, ' in ', loc))
p <- ggplot(results$running, aes(reorder(skill,-count), count)) + geom_bar(stat="identity") +
labs(x = 'Skill', y = 'Occurrences (%)', title = paste0('Skill occurrences(%) for ', jt, ' in ', loc))
p + scale_y_continuous(labels = scales::percent, breaks = seq(0,1,0.1))
```

![plot of chunk unnamed-chunk-8](figure/unnamed-chunk-8-1.png)
Expand Down
Binary file modified figure/unnamed-chunk-8-1.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit cb7e713

Please sign in to comment.