## Covid-19, Kevin Systrom and Hosting Auto-Updating Dashboards on Github

### *Six Themes in Search of a Presentation*

#### Ashutosh Sanzgiri
#### YASS
#### July 31, 2020

#### The story begins in mid-April, in the early days of the pandemic:

#### This was a heady time for data science with lots of data, analyses, and visualizations floating around...

#### We created a slack channel called <span style="color:red">covid_data_science</span> to discuss and make sense of what was going on.

#### Early morning on April 14th, Sam Seljan posted this:



<p align="center"><img src="yass_covid/img1.png" /> </p> 

#### Kevin Systrom is the co-founder of Instagram.
<p align="center"><img src="yass_covid/kevin.jpg" width="200"/> </p> 

<p align="center"><img src="yass_covid/img2.png" /> </p> 

* On his github repo, Kevin had posted a Jupyter notebook to estimate the effective reproduction number (R_t) for covid-19 using a real-time Bayesian analysis.

* He had done this for US states. Within a few hours, I modified it to run for for all US counties by state, starting with Oregon.

<p align="center"><img src="yass_covid/img3.png" /> </p> 

<p align="center"><img src="yass_covid/img5.png" width="600" /> </p> 

* Later that evening, I modified the code to estimate R_t by country and submitted a PR on k-sys/covid-19 repo.

<p align="center"><img src="yass_covid/img4.png" /> </p> 

* That PR is still languishing unmerged...

<p align="center"><img src="yass_covid/img6.png" /> </p> 

* We will come back to this story later. For now, let's talk about Jupyter notebooks on Github.

* Github is able to render Jupyter notebook, but you can't run the notebook there.

* *External options exist, such as Binder & Google Colab*

* Unless you keep pushing updates, any time-dependent analysis and plots quickly get stale.

* This is the case with Kevin's original notebook: https://github.com/k-sys/covid-19/blob/master/Realtime%20R0.ipynb. The plots have not been updated since end of April.

* What if you could auto-update your Jupyter notebooks on Github? i.e. host a dashboard that gets updated on schedule.

* Does not sound to complicated, but a simple easy-to-use solution was not available until early this year.

* Github has provided an ability to host a static website since 2015.


* Called Github Pages: you can host your personal blog and connect it to a custom domain.

* A Ruby tool called Jekyll lets you create blog posts in Markdown syntax and auto-deploy your website whenever you push your commits.

#### Enter Fastpages

* Created by Hamel Hussain (ML engineer @ Github) & Jeremy Howard (FastAI, #MasksForAll) in early 2020.

* Lots of enhancements to the blogging functionality provided by Jekyll.

* Provides a template that lets you set up your personal website in minutes.

* In particular for data science, it allows you to automatically convert your Jupyter notebook to a blog post and host it on Github.

####  Fastpages are powered by Github Actions

* Github Actions is a tool that lets you run a custom workflow on Github events such as push, issue creation or new release tag, or on a cron-like schedule.

* Workflow can be arbitrary code that can run on Github's cloud enviroment (Azure?). If necessary, you could host the runner on your cloud provider of choice or even on your laptop.

* Your actions live alongside your code in the same repo. And a failed action can create an issue in your repo :-)

#### More on Github Actions

* Free compute from Github (decent usage/resource limits, even for free accounts), that work for many use cases.

* No need to setup and schedule job elsewhere (e.g. free AWS / GCP instances).

#### Github Actions is a Magical Tool

* There is a growing list of community-contributed actions: https://github.com/sdras/awesome-actions

* Most/all of the DevOps usecases are supported.

* For datascience:

    * Your Jupyter notebooks can be rerun on schedule and converted to HTML, to build an auto-updating dashboard.
    * Very complex workflows to validate your model before releasing it to production.

#### What else can Github Actions be used for

#### Jeopardy Scraper

* Scrapes questions from https://j-archive.com/ every Saturday morning and updates a dataset on S3, which feeds into my streamlit app: http://34.218.62.118:8501/

#### Spotify Playlist Updater

* Not tried it, but cool idea: https://github.com/swinton/SpotHub
* You could use it to scrape tracks and post a new playlist on your spotify account

#### And something related to Data Science :-)

#### Running ML models and submitting entries to contests

* With this one, I actually ran into an issue with memory (7 GB) on the free Github VM.
* Had to use a self-hosted runner, eventually got it running on my personal mac. 

#### Coming back to original story

* I had set up a bunch of dashboards (auto-updating Jupyter notebooks) using Fastpages: https://sanzgiri.github.io/covid-19-dashboards/

* In particular, I had created one that calculates R_t by state in India.

* On 4/19, Rishabh Tyagi, a researcher at IIPS Mumbai, looking to do a statewise analysis for India, reached out.

<p align="center"><img src="yass_covid/img7.png" /> </p> 

#### Start of a productive collaboration

* Built a regression model to explain state-level variation in R_t, based on implemented governmental interventions, lockdown measures, testing rate, and health infrastructure.

* Recently extended the analysis to district-level.

* Our research paper has been submitted to Journal of Population Research (Springer)

* Preprint available on [ResearchGate](https://www.researchgate.net/publication/342588687_Estimation_of_Effective_Reproduction_Numbers_for_COVID-19_using_Real-Time_Bayesian_Method_for_India_and_its_States) & [ResearchSquare](https://www.researchsquare.com/article/rs-45937/v1)
<p align="center"><img src="yass_covid/img8.png" /> </p>

#### Key Takeaways

* FastPages is a quick and easy way to host a blog using Github.

* FastPages also allows you to create a blog post from your Jupyter notebook, making it very easy to Data Scientists to share their work.

* Github Actions gives you the ability to auto-update your Jupyter notebooks on a schedule and create a dashboard.

* Github Actions is a very powerful tool that you can use to create fun / useful workflows.

* You can also create presentations using Jupyter notebooks (and host them on Github which I will do after this presentation :-))

#### Acknowledgements

* Sam Seljan for the slack post and tweet that triggered this whole activity.
* Kevin Systrom for his amazing & well-documented notebook on which our analysis was based.
* Hamel Hussain & Jeremy Howard for creating FastPages and ML use cases for Github Actions.
* Rishabh Tyagi for our wonderful collaboration.

#### References:

* Github Pages: https://pages.github.com/
* FastPages: https://github.com/fastai/fastpages
* Github Actions: https://github.com/features/actions
* Creating auto-updating notebooks & dashboards with Github Actions: https://sanzgiri.github.io/jupyter/2020/04/15/fastpages-jupyter-notebooks.html
* Github Actions for ML: https://fastpages.fast.ai/actions/markdown/2020/03/06/fastpages-actions.html
* Awesome Actions: https://github.com/sdras/awesome-actions
* How this presentation was created: https://corpwiki.xandr-services.com/display/~asanzgiri/Creating+Presentations+form+Markdown

#### Presentation revealed as  Jupyter Notebook

* Powered by reveal.js
