Skip to content

Commit

Permalink
Lighter-weight 101 intro version of spam tutorial (#37)
Browse files Browse the repository at this point in the history
* First rev on lighter-weight intro tutorial

* Fixing @brahmaneya edits in PR

* Editing down code to minimal form as suggested by HE

* Simplifying code; adding TF example; editing

* Editing pass over text

* Transfer tags w jupytext build, minor edits

* Filtering abstain values

* Added a stub for SFs

* Trying to fix style check

* Skipping env_* files in flake

* Style fixes

* PR changes requested by @henryre

* Quieting nltk output

* Moved to getting_started

* Silencing LogReg warning

* Forgot to sync style fix...

* Spelling fix

* Addressing comments

* First rev on lighter-weight intro tutorial

* Fixing @brahmaneya edits in PR

* Hot fix LF names (#63)

* Mtl updates (#41)

* [EASY] Update Scorer import paths (#58)

* Save MTL updates in progress

* Give more API hints

* More text updates

* Hide unnecessary helpers in utils

* Drop unused import and sync notebook

* Save MTL updates in progress

* Give more API hints

* More text updates

* Hide unnecessary helpers in utils

* Drop unused import and sync notebook

* Address comments

* Rename mtl to multitask so file and tutorial match

* Update Scorer import paths

* Update names of loss and output funcs

* Update name of ce_loss_from_outputs()

* Rename SnorkelClassifier to MultitaskClassifier (#59)

* Update Scorer import

* Update Scorer import in vrd_tutorial

* Remove unused import

* [EASY] Add links to RTD in multitask tutorial (#65)

* Add links to RTD in multitask tutorial

* Separate sentences

* Sync multitask.ipynb

* Editing down code to minimal form as suggested by HE

* Add Drybell tutorial (#62)

* Add drybell tutorial

* Add to tox and README

* Install Java on Travis

* Pass JAVA_HOME

* Add README

* Update API

* Revisions to crowdsourcing tutorial (#64)

* Revisions to crowdsourcing tutorial

* Run tox

* Address comments

* Simplifying code; adding TF example; editing

* Recsys novel (#61)

* Recsys first commit

* Backup

* Added second version

* Recsys modeling work

* Add review processing and LFs.

* First complete draft

* typo

* Add ipynb

* Add comments, refactor

* Address comments

* Updated ipynb

* Update tox.ini to allow sync / test for recsys (but not by default)

* Address comments

* Update ipynb

* Address final comments

* Fix determinism of TF tutorial (#67)

* Fix determinism of TF tutorial

* Add os PYTHONGHASHSEED back

* Editing pass over text

* Transfer tags w jupytext build, minor edits

* Filtering abstain values

* Added a stub for SFs

* Trying to fix style check

* Skipping env_* files in flake

* Style fixes

* PR changes requested by @henryre

* Quieting nltk output

* Separate download scripts, feedback session (#55)

* Moved to getting_started

* Silencing LogReg warning

* Slicing spam (#18)

* Forgot to sync style fix...

* Add link checking (#72)

* Add link checking

* Fix

* Only run Travis on changed tutorials (#74)

* Only run Travis on changed tutorials

* Fix

* Address comments

* Fix

* Fix a couple links (#75)

* Fix link

* Fix links

* Fix Travis branch check (#77)

* Spelling fix

* Stop training on dev set [EASY] (#71)

* Stop training on dev set

* Update image link

* Add style to run envs (#78)

* Add style to run envs

* Simplify

* Make travis faster for spouse [EASY] (#80)

* Make travis faster for spouse

* remove extra cell

* sync

* all caps for constant

* More verbose build script (#81)

* Add markdown build mode (#79)

* Add markdown build mode

* Fix kwarg

* Be a bit more opinionated

* [EASY] Update path to snorkel to reflect ownership transfer (#82)

* Update path to snorkel to reflect ownership transfer

* Restore path to snorkel-superglue on HazyResearch

* Make travis only run on changed dirs (#85)

* Make travis only run on changed dirs

* Small fix

* Add space

* Update tutorials with MultitaskClassifier API changes (#68)

* Update multitask and scene_graph to last_op

* Update spam tutorial

* Update notebooks

* Sync visual_relation notebook

* Update spam notebooks

* Run tox -e fix

* Remove unused import

* Configure markdown generation (#87)

* Configure markdown generation

* Add comments

* [EASY] Replace `mtl` with `multitask` in README (#83)

* Deploy tutorial pages via Travis (#88)

* Deploy tutorial pages via Travis

* Fix commands

* Update readme (#73)

* Update readme

* Address comment

* Addressing comments
  • Loading branch information
ajratner committed Aug 14, 2019
1 parent 3476da0 commit 3bf73c9
Show file tree
Hide file tree
Showing 13 changed files with 1,035 additions and 3 deletions.
4 changes: 2 additions & 2 deletions .flake8
Expand Up @@ -15,8 +15,8 @@ exclude =
.git,
.mypy_cache,
.tox,
.env,
.venv,
.env**,
.venv**,
_build,
build,
dist
3 changes: 3 additions & 0 deletions getting_started/.gitignore
@@ -0,0 +1,3 @@
# Logs
results/
logs/
1 change: 1 addition & 0 deletions getting_started/.notebooks
@@ -0,0 +1 @@
getting_started
Empty file added getting_started/__init__.py
Empty file.
32 changes: 32 additions & 0 deletions getting_started/download_data.sh
@@ -0,0 +1,32 @@
#!/bin/bash
set -euxo pipefail

# Check that we are running from the right directory.
if [ ! "${PWD##*/}" = "getting_started" ]; then
echo "Script must be run from getting_started directory" >&2
exit 1
fi

FILES=( "Youtube01-Psy.csv" "Youtube02-KatyPerry.csv" "Youtube03-LMFAO.csv" "Youtube04-Eminem.csv" "Youtube05-Shakira.csv" )
DATA_URL="https://archive.ics.uci.edu/ml/machine-learning-databases/00380/YouTube-Spam-Collection-v1.zip"
RELOAD=false

# Check if at least any file is missing. If so, reload all data.
for filename in "${FILES[@]}"
do
if [ ! -e "data/$filename" ]; then
RELOAD=true
fi
done

if [ "$RELOAD" = true ]; then
if [ -d "data/" ]; then rm -Rf "data/"; fi
mkdir -p data
wget $DATA_URL -O data.zip
mv data.zip data/
cd data
unzip data.zip
rm data.zip
rm -rf __MACOSX
cd ..
fi

0 comments on commit 3bf73c9

Please sign in to comment.