Skip to content

Commit

Permalink
Scrape and save data
Browse files Browse the repository at this point in the history
  • Loading branch information
mine-cetinkaya-rundel committed Jul 6, 2019
1 parent e816f1b commit fcd775a
Show file tree
Hide file tree
Showing 7 changed files with 401 additions and 0 deletions.
50 changes: 50 additions & 0 deletions data-scrape.R
@@ -0,0 +1,50 @@
# load packages ----------------------------------------------------------------
library(rvest)
library(tidyverse)
library(glue)
library(tools)

# read schedule page -----------------------------------------------------------
page <- read_html("http://www.user2019.fr/talk_schedule/")

# extract table ----------------------------------------------------------------
tabs <- page %>%
html_table("td", header = TRUE)

# function to process data -----------------------------------------------------
process_schedule <- function(day_tab, day_name){

# remove unused columns ----
raw <- day_tab %>% select(-2, -Slides)

# create talks_long ----
talks_long <- raw %>%
slice(seq(1, nrow(raw), by = 2)) %>%
mutate(info = as.character(glue("{Title} <br><br> _{Speaker}_")))

# create talks_wide ----
talks_wide <- talks_long %>%
select(Time, Room, info) %>%
pivot_wider(names_from = Room, values_from = info) %>%
select(Time, `Concorde 1+2`, `Cassiopée`, `Caravelle 2`,
`Saint-Exupéry`, `Ariane 1+2`, `Guillaumet 1+2`)

# create abstracts_long ----
abstracts_long <- raw %>%
slice(seq(2, nrow(raw), by = 2)) %>%
rename(Abstract = Time) %>%
select(Abstract) %>%
bind_cols(talks_long, .) %>%
mutate(Day = toTitleCase(day_name)) %>%
select(Day, Time, Title, Speaker, Abstract, Session, Room, Chair)

# write out ----
write_csv(talks_wide, glue("data/{day_name}_talks_wide.csv"))
write_csv(abstracts_long, glue("data/{day_name}_abstracts_long.csv"))

}

# pricess days -----------------------------------------------------------------
process_schedule(tabs[[1]], "wed")
process_schedule(tabs[[2]], "thu")
process_schedule(tabs[[3]], "fri")
83 changes: 83 additions & 0 deletions data/fri_abstracts_long.csv

Large diffs are not rendered by default.

15 changes: 15 additions & 0 deletions data/fri_talks_wide.csv
@@ -0,0 +1,15 @@
Time,Concorde 1+2,Cassiopée,Caravelle 2,Saint-Exupéry,Ariane 1+2,Guillaumet 1+2
09:15,Tools for Model-Based Clustering in R <br><br> _Bettina Grün_,NA,NA,NA,NA,NA
10:25,Native Chrome Automation using R <br><br> _Christophe DervieuxRomain Lesur_,fxtract - Feature Extraction from Grouped Data <br><br> _Quay Au_,Adjusting reviewer scores for a fairer assessment via multi-faceted Rasch modelling <br><br> _Caterina Constantinescu_,The transition from conventional tools in banking to R <br><br> _Balazi Peter_,rGSAn: a R package dedicated to the gene set analysis using semantic similarity measures. <br><br> _Aarón Ayllón-BenítezPatricia Thebault_,NA
10:30,Our journey with Shiny : some packages to enhance your applications <br><br> _Victor PerrierFanny Meyer_,Spatial Optimisation with OSRM and R <br><br> _Megan Beckett_,Penalized regressions to study multivariate linear models : the VariSel package. <br><br> _Marie Perrot-Dockès_,"R++, a new Graphical User Interface for R <br><br> _Christophe Genolini_",Pathway-VisualiseR: An Interactive Web Application for Visualising Gene Networks <br><br> _Goknur GinerAlexandra Garnham_,NA
10:35,auth0: Secure Authentication in Shiny with Auth0 <br><br> _Julio Trecenti_,Anomaly detection in trivago <br><br> _Peter Brejcak_,"Maximum spacing estimation, a new method in fitdistrplus <br><br> _Christophe Dutang_",R in Pharma: A tailored approach to converting programmers to R in an industry resistant to change <br><br> _Kieran Martin_,Compiling a global database of sapflow measurements with R: Workflow and tools for the SAPFLUXNET database <br><br> _Víctor Granda_,NA
10:40,Packaging shiny applications <br><br> _Maxim Nazarov_,Using R and the Tidyverse to Play Fantasy Baseball <br><br> _Angeline Protacio_,rama: an R interface to the GAMA agent-based modeling platform <br><br> _Marc Choisy_,Community Driven Data Science in Insurance <br><br> _Kevin Kuo_,Bayesian sequential integration within a preclinical PK/PD modeling framework using rstan package: Lessons learned <br><br> _Fabiola La Gamba_,NA
10:45,Photon : Building an electron-shiny app using a simple RStudio add in. <br><br> _Abbas Rizvi_,Optimizing children sleeping time using regression and machine learning <br><br> _Alicja Fras_,RcppGreedySetCover: Scalable Set Cover <br><br> _Kaeding Matthias_,unconfUROS and one of its outputs vornoiTreemap <br><br> _Alexander Kowarik_,VICI: a Shiny app for accurate estimation of Vaccine Induced Cellular Immunogenicity with bivariate modeling <br><br> _Boris Hejblum_,NA
10:50,Visualizing Huge Amounts of Fleet Data using Shiny and Leaflet <br><br> _Andreas Wittmann_,NA,The GPareto and GPGame packages for multi and many objective Bayesian optimization <br><br> _Mickaël Binois_,An R implementation of a model-based estimator – a UK case study <br><br> _Konstantinos Soulanis_,Tools for 3D/4D interactive visualisation of cells and biological tissue <br><br> _Marion Louveaux_,NA
10:55,NA,NA,NA,Using advanced R packages for the visualisation of clinical data in a cancer hospital setting <br><br> _Roxane Legaie_,Analysis of laboratory test requests in a university hospital: A Shiny App for association analysis as a demand management tool <br><br> _Deniz Topcu_,NA
11:30,How to win friends and write an open-source book <br><br> _Jakub NowosadRobin Lovelace_,Machine Learning Infrastructure at Netflix <br><br> _Savin Goyal_,prVis: a Novel Method for Visual Dimension Reduction <br><br> _Norman MatloffTiffany JiangWenxuan ZhaoRobert Tucker_,pak: a fresh approach to package installation <br><br> _Gábor Csárdi_,"timeseriesdb - Manage, Process and Archive Time Series with R and PostgreSQL <br><br> _Matthias Bannert_",Implementation and analysis design of an adaptive-outcome trial in R <br><br> _Alessio Crippa_
11:48,Making sense of CRAN: Package and collaboration networks <br><br> _Ioannis Kosmidis_,Deploying machine learning models at scale <br><br> _Angus Taylor_,PLS for Big Data: A Unified Parallel Algorithm for Regularized Group PLS <br><br> _Benoit Liquet_,Summary of developments in R's data.table package <br><br> _Arun Srinivasan_,A feast of time series tools <br><br> _Rob Hyndman_,Advances in dose-response analysis <br><br> _Christian RitzJens C. Streibig_
12:06,RWsearch: a package for CRAN users and task view maintainers <br><br> _Patrice Kiener_,Serverless Computing for R <br><br> _Christoph BodnerThomas Laber_,multiDA and genDA: Discriminant analysis methods for large scale and complex datasets <br><br> _Sarah Romanes_,Real-time file import with the vroom package <br><br> _Jim Hester_,tsbox: Class-Agnostic Time Series <br><br> _Christoph Sax_,The next generation of the survival package <br><br> _Terry Therneau_
12:24,"Translating datasets using ""datalang"": the development of ""datos"" package for the R4DS Spanish translation <br><br> _Riva Quiroga_",A DevOps process for deploying R to production <br><br> _David Smith_,compboost: Fast and Flexible Component-Wise Boosting Framework <br><br> _Daniel Schalk_,A Future for R: Simplified Parallel and Distributed Processing <br><br> _Henrik Bengtsson_,RJDemetra: an R interface to JDemetra+ seasonal adjustment software <br><br> _Alain Quartier-La-Tente_,A flexible approach to time-to-event data analysis using case-base sampling <br><br> _Jesse Islam_
12:42,R Consortium Working Groups <br><br> _Joseph Rickert_,Authentication and authorization in plumber with the sealr package <br><br> _Friedrike Preu_,How to speed-up VSURF (Variable Selection Using Random Forests)? <br><br> _Robin Genuer_,FastRCluster: running FastR from GNU-R <br><br> _Stepan Sindelar_,Experiences from dealing with missing values in sensor time series data <br><br> _Steffen Moritz_,The R package mixmeta: an extended mixed-effects framework for meta-analysis <br><br> _Antonio Gasparrini_
14:15,'AI for Good' in the R and Python ecosystems <br><br> _Julien Cornebise_,NA,NA,NA,NA,NA

0 comments on commit fcd775a

Please sign in to comment.