Skip to content

Commit

Permalink
Updated documentation, and mostly requirements. Simplified a bit the …
Browse files Browse the repository at this point in the history
…config.
  • Loading branch information
jsga committed Feb 1, 2020
1 parent 1cd6c17 commit 544d4dc
Show file tree
Hide file tree
Showing 20 changed files with 223 additions and 241 deletions.
23 changes: 19 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,10 +1,25 @@
# My vscode config
.vscode/

# Virtual env
.venv*
.vscode/*

# Distribution / packaging
*__pycache__
bundle_notifications.egg-info
dist/
.eggs/
notebooks/.ipynb_checkpoints
build/

# Test reports
htmlcov/
.coverage
.tox/
.pytest*

# Sphinx documentation
docs/_build/
*__pycache__
.vscode/settings.json

# For now, do not keep the notebooks nor the data
notebooks/*
data/*
9 changes: 9 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
language: python
python:
- "3.7.4"
# command to install dependencies
install:
- pip install -r requirements_dev.txt
- pip install .
# command to run tests
script: make test
4 changes: 3 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
{
"python.pythonPath": "/Users/jsg/Documents/GitHub/bundle_notifications_ds/.venv_bundle_notifications/bin/python3"
"python.pythonPath": "/Users/jsg/Documents/GitHub/bundle_notifications_ds/.venv_bundle_notifications/bin/python3",
"python.formatting.provider": "black",
"git.ignoreLimitWarning": true
}
73 changes: 4 additions & 69 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Types of Contributions
Report Bugs
~~~~~~~~~~~

Report bugs at https://github.com/jsg/bundle_notifications/issues.
Report bugs at https://github.com/jsga/bundle_notifications/issues.

If you are reporting a bug, please include:

Expand Down Expand Up @@ -45,7 +45,7 @@ articles, and such.
Submit Feedback
~~~~~~~~~~~~~~~

The best way to send feedback is to file an issue at https://github.com/jsg/bundle_notifications/issues.
The best way to send feedback is to file an issue at https://github.com/jsga/bundle_notifications/issues.

If you are proposing a feature:

Expand All @@ -54,75 +54,10 @@ If you are proposing a feature:
* Remember that this is a volunteer-driven project, and that contributions
are welcome :)

Get Started!
------------

Ready to contribute? Here's how to set up `bundle_notifications` for local development.

1. Fork the `bundle_notifications` repo on GitHub.
2. Clone your fork locally::

$ git clone git@github.com:your_name_here/bundle_notifications.git

3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development::

$ mkvirtualenv bundle_notifications
$ cd bundle_notifications/
$ python setup.py develop

4. Create a branch for local development::

$ git checkout -b name-of-your-bugfix-or-feature

Now you can make your changes locally.

5. When you're done making changes, check that your changes pass flake8 and the
tests, including testing other Python versions with tox::

$ flake8 bundle_notifications tests
$ python setup.py test or pytest
$ tox

To get flake8 and tox, just pip install them into your virtualenv.

6. Commit your changes and push your branch to GitHub::

$ git add .
$ git commit -m "Your detailed description of your changes."
$ git push origin name-of-your-bugfix-or-feature

7. Submit a pull request through the GitHub website.

Pull Request Guidelines
-----------------------

Before you submit a pull request, check that it meets these guidelines:

1. The pull request should include tests.
2. If the pull request adds functionality, the docs should be updated. Put
your new functionality into a function with a docstring, and add the
feature to the list in README.rst.
3. The pull request should work for Python 3.5, 3.6, 3.7 and 3.8, and for PyPy. Check
https://travis-ci.org/jsg/bundle_notifications/pull_requests
and make sure that the tests pass for all supported Python versions.

Tips
----

To run a subset of tests::

$ pytest tests.test_bundle_notifications


Deploying
---------

A reminder for the maintainers on how to deploy.
Make sure all your changes are committed (including an entry in HISTORY.rst).
Then run::
To run tests::

$ bump2version patch # possible: major / minor / patch
$ git push
$ git push --tags
$ make test

Travis will then deploy to PyPI if tests pass.
2 changes: 1 addition & 1 deletion HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ History
0.1.0 (2020-01-29)
------------------

* First release on PyPI.
* First release.
12 changes: 4 additions & 8 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,6 @@ clean-pyc: ## remove Python file artifacts
find . -name '__pycache__' -exec rm -fr {} +

clean-test: ## remove test and coverage artifacts
rm -fr .tox/
rm -f .coverage
rm -fr htmlcov/
rm -fr .pytest_cache
Expand All @@ -53,9 +52,6 @@ lint: ## check style with flake8
test: ## run tests quickly with the default Python
pytest

test-all: ## run tests on every Python version with tox
tox

coverage: ## check code coverage quickly with the default Python
coverage run --source bundle_notifications -m pytest
coverage report -m
Expand All @@ -73,8 +69,8 @@ docs: ## generate Sphinx HTML documentation, including API docs
servedocs: docs ## compile the docs watching for changes
watchmedo shell-command -p '*.rst' -c '$(MAKE) -C docs html' -R -D .

release: dist ## package and upload a release
twine upload dist/*
# release: dist ## package and upload a release
# twine upload dist/*

dist: clean ## builds source and wheel package
python setup.py sdist
Expand All @@ -87,8 +83,8 @@ install: clean ## install the package to the active Python's site-packages
new_env: ## Create new virtual env
python -m venv .venv_bundle_notifications

activate: ## Activate virtual env (not working FIXME)
@bash -c "source .venv_bundle_notifications/bin/activate"
# activate: ## Activate virtual env (not working FIXME)
# @bash -c "source .venv_bundle_notifications/bin/activate"

requirements: ## Install requirements and current package
python -m pip install -U pip setuptools wheel
Expand Down
126 changes: 110 additions & 16 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,35 +3,129 @@ bundle_notifications
====================


.. image:: https://img.shields.io/pypi/v/bundle_notifications.svg
:target: https://pypi.python.org/pypi/bundle_notifications

.. image:: https://img.shields.io/travis/jsg/bundle_notifications.svg
:target: https://travis-ci.org/jsg/bundle_notifications
.. image:: https://img.shields.io/travis/jsga/bundle_notifications.svg
:target: https://travis-ci.org/jsga/bundle_notifications

.. image:: https://readthedocs.org/projects/bundle-notifications/badge/?version=latest
:target: https://bundle-notifications.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status



Features
---------

Tool for bundling notifications
This package contains a tool for bundling notifications in event streams. The goal is to minimize the number of notifications sent to users.

As an example, here it is the first couple of rows for a sample user_id:

* Free software: MIT license
* Documentation: https://bundle-notifications.readthedocs.io.
=================== ============================== ============================== =============
timestamp user_id friend_id friend_name
=================== ============================== ============================== =============
2017-08-08 11:04:36 0005BDD51B0185DCF1A4932CEB8437 0B56C34B2BB9B80100D1D5B5AB74EA Rameshwor
2017-08-10 12:29:47 0005BDD51B0185DCF1A4932CEB8437 266C5C5239255DF65ECFFDCEAF7048 Iustinian
2017-08-11 11:53:12 0005BDD51B0185DCF1A4932CEB8437 0B56C34B2BB9B80100D1D5B5AB74EA Rameshwor
2017-08-12 23:42:11 0005BDD51B0185DCF1A4932CEB8437 FB63F29610B1EF67AD75C4BABDFCE1 Sara
2017-08-24 14:49:06 0005BDD51B0185DCF1A4932CEB8437 0B56C34B2BB9B80100D1D5B5AB74EA Rameshwor
2017-08-31 14:30:48 0005BDD51B0185DCF1A4932CEB8437 FB63F29610B1EF67AD75C4BABDFCE1 Sara
2017-09-01 13:21:59 0005BDD51B0185DCF1A4932CEB8437 DACE6D3C78D9B20B1F70A271BA98D5 Julie
2017-09-01 13:29:40 0005BDD51B0185DCF1A4932CEB8437 DACE6D3C78D9B20B1F70A271BA98D5 Julie
2017-09-01 17:13:37 0005BDD51B0185DCF1A4932CEB8437 DACE6D3C78D9B20B1F70A271BA98D5 Julie
2017-09-26 13:02:32 0005BDD51B0185DCF1A4932CEB8437 FB63F29610B1EF67AD75C4BABDFCE1 Sara
=================== ============================== ============================== =============

Using bundle_notifications tool, we get the following DataFrame with 3 new columns:

Features
--------
1. **tours**: number of friends that have gone on a tour since the beginning of the stream of data
2. **timestamp_first_tour**: timestamp for the first tour amongst his/her friends
3. **message**: notification message

Here it is the outcome:

=================== ============================== ============================== ============= ======= ====================== =====================================
timestamp user_id friend_id friend_name tours timestamp_first_tour message
=================== ============================== ============================== ============= ======= ====================== =====================================
2017-08-08 11:04:36 0005BDD51B0185DCF1A4932CEB8437 0B56C34B2BB9B80100D1D5B5AB74EA Rameshwor 1 2017-08-08 11:04:36 Rameshwor went on a tour
2017-08-10 12:29:47 0005BDD51B0185DCF1A4932CEB8437 266C5C5239255DF65ECFFDCEAF7048 Iustinian 2 2017-08-08 11:04:36 Rameshwor and 1 other went on a tour
2017-08-11 11:53:12 0005BDD51B0185DCF1A4932CEB8437 0B56C34B2BB9B80100D1D5B5AB74EA Rameshwor 2 2017-08-08 11:04:36 Rameshwor and 1 other went on a tour
2017-08-12 23:42:11 0005BDD51B0185DCF1A4932CEB8437 FB63F29610B1EF67AD75C4BABDFCE1 Sara 3 2017-08-08 11:04:36 Rameshwor and 2 others went on a tour
2017-08-24 14:49:06 0005BDD51B0185DCF1A4932CEB8437 0B56C34B2BB9B80100D1D5B5AB74EA Rameshwor 3 2017-08-08 11:04:36 Rameshwor and 2 others went on a tour
2017-08-31 14:30:48 0005BDD51B0185DCF1A4932CEB8437 FB63F29610B1EF67AD75C4BABDFCE1 Sara 3 2017-08-08 11:04:36 Rameshwor and 2 others went on a tour
2017-09-01 13:21:59 0005BDD51B0185DCF1A4932CEB8437 DACE6D3C78D9B20B1F70A271BA98D5 Julie 4 2017-08-08 11:04:36 Rameshwor and 3 others went on a tour
2017-09-01 13:29:40 0005BDD51B0185DCF1A4932CEB8437 DACE6D3C78D9B20B1F70A271BA98D5 Julie 4 2017-08-08 11:04:36 Rameshwor and 3 others went on a tour
2017-09-01 17:13:37 0005BDD51B0185DCF1A4932CEB8437 DACE6D3C78D9B20B1F70A271BA98D5 Julie 4 2017-08-08 11:04:36 Rameshwor and 3 others went on a tour
2017-09-26 13:02:32 0005BDD51B0185DCF1A4932CEB8437 FB63F29610B1EF67AD75C4BABDFCE1 Sara 4 2017-08-08 11:04:36 Rameshwor and 3 others went on a tour
=================== ============================== ============================== ============= ======= ====================== =====================================

Note that Julie went on 3 tours in a row. The tool takes this into account this, so that Julie's friend receive a single notification that Julie went on a tour.




Quickstart
-----------------


1. Clone the repository from Github::

$ git clone https://github.com/jsga/bundle_notifications.git

Alternatively, you can manually download the repository as a zip file.

2. Make sute the terminal is at the root of the package::

* TODO
$ cd bundle_notifications
$ pwd
>/Users/myuser/Documents/GitHub/bundle_notifications

You should see something like the above

3. Install the package::

$ python setup.py install

4. You should be reasy to start bundling your first notifications! Using an example dataset_, printing only 20 rows::

$ bundle_notifications -p "https://static-eu-komoot.s3.amazonaws.com/backend/challenge/notifications.csv" -n 10

::

Downloading data...
Bundling notifications...
Great! Here there are the bundled notifications
timestamp user_id friend_id friend_name tours timestamp_first_tour message
------------------- ------------------------------ ------------------------------ ------------- ------- ---------------------- ------------------------
2017-08-01 00:06:47 F62712701E7AF6588B69A44235A6FC 06D188F4064E0D47BD760EEFEB7AAD Geir 1 2017-08-01 00:06:47 Geir went on a tour
2017-08-01 00:31:05 DF5BB50FAD220C8D2A8FF9A0DBAA47 588C89FCADD0DBA0E722822513A267 Antim 1 2017-08-01 00:31:05 Antim went on a tour
2017-08-01 00:35:24 8473CCCE79294CB494D1B42E2B1BAA EDBB3D240ADBCF6CF175B192630ABB Σωτήριος 1 2017-08-01 00:35:24 Σωτήριος went on a tour
2017-08-01 01:20:47 CFFEC5978B0A4A05FA6DCEFB2C82CC 2BB0471CAA78ED0FCEE143E175F034 Mona 1 2017-08-01 01:20:47 Mona went on a tour
2017-08-01 01:21:39 0978C6F8C5093039165B5C571EACC8 45FE4C99C612BEEDE6A34B54C5369D Laura 1 2017-08-01 01:21:39 Laura went on a tour
2017-08-01 01:21:58 FBA67EFA2766854B885F25C06CC2FA 92DEF3A48927B1B2B0295936679D1C Rameshwor 1 2017-08-01 01:21:58 Rameshwor went on a tour
2017-08-01 01:44:16 BE6B4CBB422BBF114FB109921F2B9F 7BCD287DF0EBF5CAA86458737777BD Noë 1 2017-08-01 01:44:16 Noë went on a tour
2017-08-01 02:09:58 391A4416FC0ADE8FD604B2F1A9BCCE 96593EE816FB4CE2AEBA5B754CFA38 Λεωνίδας 1 2017-08-01 02:09:58 Λεωνίδας went on a tour
2017-08-01 02:20:32 D12E9E35AF8817E88F94F966B9C1F8 723515D5D083C9C15EC9A24AA624D7 Lina 1 2017-08-01 02:20:32 Lina went on a tour
2017-08-01 02:20:32 DDBA7653545B1BB68658838A22BAA5 723515D5D083C9C15EC9A24AA624D7 Lina 1 2017-08-01 02:20:32 Lina went on a tour


Features: current and future
--------------------------------

The tool is mainly pandas. It relies on two main functionalities: reading csv files and group-apply function. With a 30 MB dataset it takes around 1 minute to compute the groupping and writting down the messages.

The advantage of using pandas over custom-made tools is its simplicity. Also, it is super easy to parallelize such functions. Since the groupping are done by user_id it would be straight forward to parallelize the comptutations with existing tools like Modin_ or, if we are dealing with large datasets, with Dask_.

Here there are some possible future improvements:

1. Implement checks. What if some ID's are empty?
2. Encapsulating this tool in a Docker image would make it much easier to move from development to a productions server.
3. Parallelize the computation, using Modin_ or Dask_. If the docker image is in place we could scale this up to many threads quite easily
4. Option to read the data directly from a database, so that this tool can be run periodically without human supervision


* Documentation: https://bundle-notifications.readthedocs.io.

Credits
-------

This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.

.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage
.. _dataset: https://static-eu-komoot.s3.amazonaws.com/backend/challenge/notifications.csv
.. _modin: https://github.com/modin-project/modin
.. _Dask: https://dask.org/
5 changes: 0 additions & 5 deletions build/lib/bundle_notifications/__init__.py

This file was deleted.

1 change: 0 additions & 1 deletion build/lib/bundle_notifications/bundle_notifications.py

This file was deleted.

16 changes: 0 additions & 16 deletions build/lib/bundle_notifications/cli.py

This file was deleted.

9 changes: 4 additions & 5 deletions bundle_notifications/bundle_notifications.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,8 @@ def load_data(path_csv = "https://static-eu-komoot.s3.amazonaws.com/backend/chal
Returns
-------
pd.DataFrame
DataFrame containing the stream of data. It has 4 columns:
'timestamp','user_id','friend_id','friend_name'
The column named 'timestamp' is returned as a datetime64[ns] type.
DataFrame containing the stream of data. It has 4 columns: 'timestamp','user_id','friend_id','friend_name'
The column named 'timestamp' is cast as a datetime64[ns] type.
"""

Expand Down Expand Up @@ -63,12 +62,12 @@ def bundle_func(df_g):
Parameters
----------
df_g : pd.DataFrame
DataFrame containing 4 columns: 'timestamp','user_id','friend_id','friend_name'
DataFrame containing 3 columns: 'timestamp_first_tour','tours','message'
Returns
-------
pd.DataFrame
Contains 3 extra columns
Contains 3 extra columns as explained above
"""

Expand Down

0 comments on commit 544d4dc

Please sign in to comment.