Skip to content
Website for Data Science for Everyone course at the University of Birmingham, 2019
Jupyter Notebook Python JavaScript CSS HTML Ruby
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
_bibliography Surviving computers. Oct 8, 2019
_courses Add class Feb 14, 2020
_data-science-bib @ 65e5415 Updates for class 13 Feb 7, 2019
_data Remove build artefacts from repo Oct 26, 2018
_includes
_layouts css fixes Jul 8, 2018
_ok_exercises
_plugins Fix rendering of site.baseurl in data8 macro Jan 17, 2019
_posts first site build Jun 14, 2018
_sass first site build Jun 14, 2018
assets A version of Chris H's suggested CSS fix Feb 8, 2019
assignments
data
downloads Permutations presentation Nov 19, 2019
exercises
images Only write two iterations of fake balls Nov 20, 2019
notebooks
notes Add notes on datasets Feb 6, 2020
scripts Add ipynb file to gitignore for ok directory Dec 23, 2019
.gitignore
.gitmodules
Gemfile Add unicode to Gemfile Feb 1, 2019
Gemfile.lock
Makefile
README.md
Rakefile first site build Jun 14, 2018
SUMMARY.md Preparing for correlation, standard scores Feb 9, 2020
_config.yml Refactor course-specifics into links Nov 19, 2019
_course
about.md
announcements.md
assessments.md Move course to shared page, split off assessments Nov 22, 2019
background_reading.md
build-requirements.txt Add notebook requirements to build requirements Feb 1, 2019
classes Refactor course-specifics into links Nov 19, 2019
course.md Move course to shared page, split off assessments Nov 22, 2019
extracting.md
favicon.png logo Jul 6, 2018
index.md Change index.md to link Nov 21, 2019
installing.md Refactor of background etc pages. Nov 22, 2019
installing_packages.md
license.md Berkeley labels at bottom of page, not top Jan 17, 2019
minimal-mistakes-jekyll.gemspec first site build Jun 14, 2018
open_jupyter.md
package.json first site build Jun 14, 2018
project.md
projects Reorganize files for project Jan 13, 2020
requirements.txt
setup.md Refactor of background etc pages. Nov 22, 2019
slim_setup.md Describe setup Oct 4, 2018
start_terminal.md Refactor of background etc pages. Nov 22, 2019
staticman.yml
syllabus.md
tools.md
what.md
why.md Update after class Sep 30, 2019

README.md

Data science for everyone

Course materials and notes for the course Data Science for Everyone.

This textbook is based on the Berkeley Foundations of Data Science course. The most recent version of the course is at Computational and Inferential Thinking. The repository for the textbook is on Github.

Versions of the Berkeley course come from the last commit in that repository that is licensed with a Creative Commons CC-BY-NC license

  • 64b20f0. The following commit (710ed4e) relicensed the work with a CC-BY-NC-ND license, forbidding derivative works.

Machinery

The template for this website comes from https://github.com/choldgraf/textbooks-with-jupyter - many thanks to Chris Holdgraf for putting that together.

Getting started for working on the repository

Say your Github username is my-gh-user.

Go to the repository page that houses this README - for example https://github.co/matthew-brett/dsfe.

Click on "Fork" button near top right, to make your own fork of the repository, that will now be at https://github.com/my-gh-user/<repo-name> where <repo-name> is the name of the repository housing this README.

Before you clone the repository, make sure you are working in a case-sensitive filesystem. The default macOS filesystem is not case-sensitive, see the section "Case-sensitive files on the Mac" near the end, before you continue, and clone into this new filesystem.

The following assumes that the README is in https://github.com/matthew-brett/dsfe. The name of the repository is therefore dsfe. Substitute URL and repository name throughout.

Clone the main repo:

git clone https://github.com/matthew-brett/dsfe

Add a remote for your fork:

cd dsfe
git remote add my-gh-user https://github.com/my-gh-user/dsfe.git
git fetch my-gh-user

Get the submodules for the repository (you'll need these for the build):

git submodule update --init

Start by making some branch to work on, linked to your fork. Use a name to match the kind of changes you are about to make, like rewrite-intro-pages:

git branch rewrite-intro-pages
git checkout rewrite-intro-pages

Associate this branch with your fork:

git push my-gh-user rewrite-intro-pages -u

The -u flag above stores the association of this branch with your fork, referenced by my-gh-user.

Installing stuff for building / serving the repository files

If you use Conda then you might make a Conda environment for working on the repo. I don't, I use pip, and I make a virtual environment. You can do that like this:

python3 -m venv my-venv
source my-virtualenv/bin/activate

Or, if you have virtualenvwrapper (I do) then, you might prefer:

python3 -m venv $WORKON_HOME/my-venv
workon my-venv

Install the Python packages you need for building the site:

pip install -r build-requirements.txt

Install the site build / serve engine, Jekyll, by following the Jekyll install instructions.

Finish up with a final:

bundle install

Finally, check that you can run the local website server with:

make serve

Copy the URL that comes up, and paste into your browser's URL bar, to check you get can load the local website copy.

Configuring Jupyter to save / load in R Markdown

I'm using the excellent Jupytext to make it easier to edit Jupyter Notebooks. Jupytext automates saving Notebook files as Markdown (and other formats), and loading them from edited Markdown (and other formats).

You need to configure Jupyter to use it. If you don't have a Jupyter configuration, do:

jupyter notebook --generate-config

You should now have a file ~/.jupyter/jupyter_notebook_config.py. Append these lines:

c.NotebookApp.contents_manager_class = "jupytext.TextFileContentsManager"
c.ContentsManager.default_jupytext_formats = "ipynb,Rmd"

I also turned off autosave globally, by following the instructions in this stackoverflow answer. This stops autosave saving over any edits that I am making in the Markdown source.

Be careful - if you are used to autosave in Jupyter, you can easily lose work when you disable autosave.

mkdir -p ~/.jupyter/custom

Add the following line to ~/.jupyter/custom/custom.js:

Jupyter.notebook.set_autosave_interval(0); // disable autosave

Finally, you may want to clone the original Berkeley textbook:

# Get out of dsfe tree
cd ..
git clone https://github.com/data-8/textbook

Make sure you are using the last commit we can legally use, from the Berkeley repository:

cd textbook
# Checkout the last CC-BY-NC commit
git checkout 64b20f0

Extra stuff

Consider installing hub to make interactions with Github easier, from the command line.

Configuring build etc

You might want to check the instructions for configuring the build at https://github.com/choldgraf/textbooks-with-jupyter.

Workflow

Developing

  • make serve to run the local server serving _site directory.
  • Attach browser to http://localhost:4000/dsfe/ as suggested in output of make serve.
  • Edit .Rmd and / or .ipynb files
  • make rebuild-notebooks to rebuild .ipynb from more recent .Rmd files, and rebuild .md files from more recent .ipynb files.
  • Review in browser

Shipping

  • Final check
  • Ship with make github

Case-sensitive filesystem on the Mac

The default file-systems for current Macs are Journalled HFS+, or APFS, neither of which are case-sensitive by default. This causes problems with file-names for the built files - see https://github.com/choldgraf/jupyter-book/pull/27.

You can check if you are on a case-sensitive file-system with:

mkdir tmp
touch tmp/abcd.txt
touch tmp/abcD.txt
ls tmp/ab*.txt

If you see only one file listed, you're on a case-insensitive file-system, and this will cause problems for editing and uploading the files in this repo.

The easy way, with modern macOS

The easiest way to solve this, on a modern Mac, is to make a new case-sensitive APFS volume. Go to Disk Utility, click on a hard drive, click on the + icon at the top left, under "Volume", and you should get a GUI for "Add APFS volume to a container?". Choose "APFS (Case-sensitive)". With all done, you should have a new volume, into which you can clone the repository.

The hard way, with older macOS

If you do not have the option above, you can also make a new disk image file, and mount that. In my hands, this started to get very slow, as the disk image got close to full. Here are the instructions in case you want to give it a go:

  • Make a disk image with a case-sensitive file-system on it.
  • Mount the disk image
  • Work inside the mounted disk image.

Clone this Gist:

git clone https://gist.github.com/faa9ccc0d7cb2936263f16192106a98a

Have a look at the .plist file inside, and follow the instructions in the comments, to set this up. When you have followed the instructions, you should find that the system mounts the image automatically when you log in.

You can’t perform that action at this time.