Skip to content
This repository has been archived by the owner on Oct 5, 2021. It is now read-only.

Commit

Permalink
Merge branch 'redis-cache'
Browse files Browse the repository at this point in the history
  • Loading branch information
Olivier Demah committed Jun 14, 2015
2 parents 3557023 + 2ca2d5e commit 2767630
Show file tree
Hide file tree
Showing 13 changed files with 542 additions and 451 deletions.
68 changes: 41 additions & 27 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,12 +55,14 @@ who permits to use a manager of tasks like "cron" and, of course Python.
Requirements
============

* Python 3.4.x, 2.7.x
* Python 3.4.x
* `Django <https://pypi.python.org/pypi/Django/>`_ >= 1.8
* `Celery <http://www.celeryproject.org/>` == 3.1.18
* `django-th-rss <https://github.com/foxmask/django-th-rss>`_ == 0.3.0
* `django-th-pocket <https://github.com/foxmask/django-th-pocket>`_ == 0.2.0
* `django-js-reverse <https://pypi.python.org/pypi/django-js-reverse/>`_ == 0.3.3

* `django-redis-cache <https://pypi.python.org/pypi/django-redis-cache/>` == 0.13.1
* `django-redisboard <https://pypi.python.org/pypi/django-redisboard/>` == 1.2.0

Installation
============
Expand Down Expand Up @@ -134,17 +136,6 @@ TH_SERVICES is a list of the services we, like for example,
)
If you plan to integrate django_th in an existing project then, to deal with the templates and avoid the TemplateDoesNotExist error you can
copy the template in your own templates directory or set the path like this :
.. code:: python
import os
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
TEMPLATE_DIRS += (
BASE_DIR + '/../lib/<python-version>/site-package/django_th/templates/',
)
urls.py
-------
Expand All @@ -163,6 +154,39 @@ urls.py
)
CACHE
~~~~~
For each TriggerHappy component, define one cache like below
.. code:: python
# RSS Cache
'th_rss':
{
'TIMEOUT': 500,
"BACKEND": "redis_cache.cache.RedisCache",
"LOCATION": "127.0.0.1:6379",
"OPTIONS": {
"DB": 2,
"CLIENT_CLASS": "redis_cache.client.DefaultClient",
}
},
# Twitter Cache
'th_twitter':
{
'TIMEOUT': 500,
"BACKEND": "redis_cache.cache.RedisCache",
"LOCATION": "127.0.0.1:6379",
"OPTIONS": {
"DB": 3,
"CLIENT_CLASS": "redis_cache.client.DefaultClient",
}
},
Setting up : Administration
===========================
Expand Down Expand Up @@ -215,25 +239,15 @@ Here are the available management commands :
[django_th]
fire_th
fire_th_as #use asyncio
fire_th_trollius #use asyncio backported named "trollius" for python 2.7
To start handling the queue of triggers you/your users configured, just set the management commands fire_th in a crontab or any other scheduler solution of your choice.
e.g. :
.. code:: python
manage.py fire_th
or if you use python 3.4.x
To start handling the queue of triggers you/your users configured, just set the management commands fire_th in a crontab or any other scheduler solution of your choice.
.. code:: python
manage.py fire_th_as
which will use asyncio
manage.py fire_th # will the enchain both read and publish
manage.py fire_read_data # will get the data from any service and put them in cache
manage.py fire_publish_data # will read the data from the cache and put them to another service
Also : Keep in mind to avoid to set a too short duration between 2 run to avoid to be blocked by the externals services (by their rate limitation) you/your users want to reach.
Expand Down
6 changes: 6 additions & 0 deletions django_th/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
from __future__ import absolute_import

VERSION = (0, 10, 1) # PEP 386
__version__ = ".".join([str(x) for x in VERSION])

default_app_config = 'django_th.apps.DjangoThConfig'

# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from .celery import app as celery_app
28 changes: 28 additions & 0 deletions django_th/celery.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
from __future__ import absolute_import

import os

from celery import Celery
from celery.schedules import crontab
from django.conf import settings

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'django_th.settings')

app = Celery('django_th')

app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)

# will start the 2 tasks,
# one each 41 minutes
# the other one each hour
CELERYBEAT_SCHEDULE = {
'add-read-data': {
'task': 'tasks.read_data',
'schedule': crontab(minute='*/41'),
},
'add-publish-data': {
'task': 'tasks.publish_data',
'schedule': crontab(hour='*/1'),
},
}
6 changes: 6 additions & 0 deletions django_th/local_settings_sample.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,9 @@
'consumer_key': 'my key',
'consumer_secret': 'my secret'
}

# CELERY
BROKER_URL = 'redis://localhost:6379/0'

# REDISBOARD
REDISBOARD_DETAIL_FILTERS = ['.*']
22 changes: 22 additions & 0 deletions django_th/management/commands/fire_publish_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import unicode_literals

from django.core.management.base import BaseCommand
from django.utils.log import getLogger

from django_th.tasks import publish_data

# create logger
logger = getLogger('django_th.trigger_happy')


class Command(BaseCommand):

help = 'Trigger all the services and put them in cache'

def handle(self, *args, **options):
"""
get all the triggers that need to be handled
"""
publish_data.delay()
22 changes: 22 additions & 0 deletions django_th/management/commands/fire_read_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import unicode_literals

from django.core.management.base import BaseCommand
from django.utils.log import getLogger

from django_th.tasks import read_data

# create logger
logger = getLogger('django_th.trigger_happy')


class Command(BaseCommand):

help = 'Trigger all the services and put them in cache'

def handle(self, *args, **options):
"""
get all the triggers that need to be handled
"""
read_data.delay()
162 changes: 9 additions & 153 deletions django_th/management/commands/fire_th.py
100755 → 100644
Original file line number Diff line number Diff line change
@@ -1,168 +1,24 @@
#!/usr/bin/env python
# coding: utf-8
# -*- coding: utf-8 -*-
from __future__ import unicode_literals

import datetime
import time
import arrow

from django.core.management.base import BaseCommand
from django.conf import settings
from django_th.services import default_provider
from django_th.models import TriggerService
from django.utils.log import getLogger

from django_th.tasks import read_data
from django_th.tasks import publish_data

# create logger
logger = getLogger('django_th.trigger_happy')


class Command(BaseCommand):
help = 'Trigger all the services'

def update_trigger(self, service):
"""
update the date when occurs the trigger
"""
now = arrow.utcnow().to(settings.TIME_ZONE).format(
'YYYY-MM-DD HH:mm:ss')
TriggerService.objects.filter(id=service.id).update(date_triggered=now)

def to_datetime(self, data):
"""
convert Datetime 9-tuple to the date and time format
feedparser provides this 9-tuple
"""
my_date_time = None

if 'published_parsed' in data:
my_date_time = datetime.datetime.utcfromtimestamp(
time.mktime(data.published_parsed))
elif 'updated_parsed' in data:
my_date_time = datetime.datetime.utcfromtimestamp(
time.mktime(data.updated_parsed))
elif 'my_date' in data:
my_date_time = arrow.get(str(data['my_date']),
'YYYY-MM-DD HH:mm:ss')

return my_date_time
help = 'Trigger all the services'

def handle(self, *args, **options):
"""
run the main process
get all the triggers that need to be handled
"""
default_provider.load_services()
trigger = TriggerService.objects.filter(status=True).select_related('consumer__name', 'provider__name')
if trigger:
for service in trigger:
# flag to know if we have to update
to_update = False
# flag to get the status of a service
status = False
# counting the new data to store to display them in the log
count_new_data = 0
# provider - the service that offer data
service_name = str(service.provider.name.name)
service_provider = default_provider.get_service(service_name)

# consumer - the service which uses the data
service_name = str(service.consumer.name.name)
service_consumer = default_provider.get_service(service_name)

# check if the service has already been triggered
# if date_triggered is None, then it's the first run
if service.date_triggered is None:
logger.debug("first run for %s => %s " % (str(
service.provider.name),
str(service.consumer.name.name)))
to_update = True
status = True
# run run run
else:
# 1) get the data from the provider service
# get a timestamp of the last triggered of the service
datas = getattr(service_provider, 'process_data')(
service.provider.token, service.id,
service.date_triggered)
consumer = getattr(service_consumer, 'save_data')

published = ''
which_date = ''

# 2) for each one
for data in datas:
#  if in a pool of data once of them does not have
#  a date, will take the previous date for this one
#  if it's the first one, set it to 00:00:00

# let's try to determine the date contained in
# the data...
published = self.to_datetime(data)

if published is not None:
# get the published date of the provider
published = arrow.get(str(published), 'YYYY-MM-DD HH:mm:ss').to(settings.TIME_ZONE)
# store the date for the next loop
#  if published became 'None'
which_date = published
# ... otherwise set it to 00:00:00 of the current date
if which_date == '':
# current date
which_date = arrow.utcnow().replace(hour=0, minute=0, second=0).to(settings.TIME_ZONE)
published = which_date
if published is None and which_date != '':
published = which_date
# 3) check if the previous trigger is older than the
#  date of the data we retrieved
#  if yes , process the consumer

# add the TIME_ZONE settings
# to localize the current date
date_triggered = arrow.get(str(service.date_triggered), 'YYYY-MM-DD HH:mm:ss').to(
settings.TIME_ZONE)

# if the published date if greater or equal to the last
# triggered event ... :
if date_triggered is not None and published is not None and published >= date_triggered:

if 'title' in data:
logger.info("date {} >= date triggered {} title {}".format(published, date_triggered,
data['title']))
else:
logger.info("date {} >= date triggered {} ".format(published, date_triggered))

status = consumer(
service.consumer.token, service.id, **data)

to_update = True
count_new_data += 1
# otherwise do nothing
else:
if 'title' in data:
logger.debug("data outdated skipped : [{}] {}".format(published, data['title']))
else:
logger.debug("data outdated skipped : [{}] ".format(published))

# update the date of the trigger at the end of the loop
sentence = "user: {} - provider: {} - consumer: {} - {}"
if to_update:
if status:
logger.info((sentence + " - {} new data").format(
service.user,
service.provider.name.name,
service.consumer.name.name,
service.description,
count_new_data))
self.update_trigger(service)
else:
logger.info((sentence + " AN ERROR OCCURS ").format(
service.user,
service.provider.name.name,
service.consumer.name.name,
service.description))
else:
logger.info((sentence + " nothing new").format(
service.user,
service.provider.name.name,
service.consumer.name.name,
service.description))
else:
self.stdout.write("No trigger set by any user")
read_data.delay()
publish_data.delay()

0 comments on commit 2767630

Please sign in to comment.