An introduction to data analysis with Pandas & Jupyter notebook

https://github.com/spbail/pandas-and-jupyter-workshop

This workshop is aimed at Python users with no prior knowledge of Pandas. In this workshop, we will explore a small dataset and introduce you to the basics of data analysis workflows using the Pandas library and Jupyter notebook. After the workshop, learners will know how to to load data from a CSV file, do some basic exploratory data analysis and data cleaning, generate simple statistics, and create some basic data visualizations.

Materials by Sam Bail @spbail, based on a workshop by Alda Pontes.

Pre-requirements for the workshop

We expect a working knowledge of Python in order to be able to follow along with the workshop. If you are an absolute beginner in Python and aren't familiar with Python syntax, this workshop might not be suited for you.

OPTION 1: Binder link to run remote notebook

Binder is a web-based hub for Jupyter notebook. If your local setup does not work or if you prefer not to install anything locally, you can use the link here to work in a notebook on Binder. Please note that Binder will delete your notebook instance after 12 hours. You can download the notebook to your local machine at the end to have your own copy!

Click this icon to launch the notebook:

OPTION 2: Setup to run the notebook locally

Step 0: Download the materials

Clone this git repo to your machine and move your notebook copy you've downloaded from binder into the directory
Or start over with the default version of the notebook in the repo

Step 1: Make sure you are running a recent version of Python

I'm using a miniconda installation with Python 3.7

Step 2: Install the necessary libraries for the workshop

Install the necessary libraries by running pip install -r requirements.txt in the repo directory
Do this in a new virtual environment (e.g. a new conda environment) if necessary

Step 3: Make sure Jupyter Notebook runs

Open a terminal window in the directory where you downloaded the notebook and run: jupyter notebook
This should open a browser window, or go to http://localhost:8888/notebooks/

Step 4: Download the data file

Download the mock_treatment_starts_2016.csv file from this repo. NOTE The data is entirely made up and is in no way related to any real patient data.

About Sam

Hi, I'm Sam! I am a data professional with experience working with healthcare data and building data infrastructure tools. I draw from a large toolkit ranging from various SQL flavors to Python, Pandas, Jupyter Notebook and R to statistical methods, data science and data visualization (Tableau, Superset...), as well as clinical terminologies and software engineering and automation tools - whatever gets the job done.

I completed a PhD in theoretical semantic web foundations at the School of Computer Science, The University of Manchester, UK. My thesis focused on exploring and exploiting the "justificatory structure" of OWL ontologies. While in the UK, I co-founded and lead "Manchester Girl Geeks", a volunteer-based community organization that has been running STEM workshops for girls and women in the area since 2009.

https://www.twitter.com/spbail

https://www.linkedin.com/in/spbail/

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
README.md		README.md
jupyter_pandas_workshop.ipynb		jupyter_pandas_workshop.ipynb
mock_treatment_starts_2016.csv		mock_treatment_starts_2016.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An introduction to data analysis with Pandas & Jupyter notebook

https://github.com/spbail/pandas-and-jupyter-workshop

OPTION 1: Binder link to run remote notebook

OPTION 2: Setup to run the notebook locally

Step 0: Download the materials

Step 1: Make sure you are running a recent version of Python

Step 2: Install the necessary libraries for the workshop

Step 3: Make sure Jupyter Notebook runs

Step 4: Download the data file

About Sam

About

Releases

Packages

Languages

spbail/pandas-and-jupyter-workshop

Folders and files

Latest commit

History

Repository files navigation

An introduction to data analysis with Pandas & Jupyter notebook

https://github.com/spbail/pandas-and-jupyter-workshop

OPTION 1: Binder link to run remote notebook

OPTION 2: Setup to run the notebook locally

Step 0: Download the materials

Step 1: Make sure you are running a recent version of Python

Step 2: Install the necessary libraries for the workshop

Step 3: Make sure Jupyter Notebook runs

Step 4: Download the data file

About Sam

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages