A Vagrant machine ready for development of data science projects in Python.
The box includes:
1. Basics: numpy, pandas, scipy, jupyter
2. Data acquisition: requests, beautifulsoup, feedparser, scrapy
3. NLP: re, NLTK
4. Database connection: psycopg2, pymongo, pyodbc
5. AWS: boto3
6. Web Framework: flask
7. Visualization: matplotlib, seaborn, bokeh
8. Machine Learning: scikit-learn, theano, keras, tensorflow
· Install VirtualBox
· Install Vagrant
· In the terminal, clone the repository
$ git clone git@github.com:jaimeps/vagrant-data-science.git
· Change to your project directory
$ cd vagrant-data-science
· To create the VM, run
$ vagrant up
· Once the setup is finished, we can login to the VM
$ vagrant ssh
To launch jupyter run
$ jupyter notebook --ip=0.0.0.0
You can find the notebook in the host's browser at http://127.0.0.1:8888
We can use our local Pycharm IDE with the Vagrant box as the Python interpreter.
In Pycharm go to Preferences > Project > Project Interpreter
Select "Add remote" in the settings button
Select Vagrant as the interpreter and the folder of the vagrant box in your computer
Now the Pycharm interpreter should look like this
For convenience the "shared" folder is synced.
- To increase the memory or CPU count, change the following lines in the Vagrantfile:
vb.memory = "1024"
vb.cpus = 4
- To add/remove Python modules in the setup, see the script in
bootstrap.sh