Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the AniDB utility (together with a test!) #248

Merged
merged 9 commits into from Feb 8, 2017

Conversation

jilljenn
Copy link
Member

@jilljenn jilljenn commented Feb 1, 2017

Fixes #220.

  • AniDB does not allow hotlinks, so we should download the poster (see the management command).
  • The random_ip method does not work anymore on the myAnimeList API :D

parser.add_argument('id', type=int)

def handle(self, *args, **options):
if options.get('id'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C'est possible que ça soit false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pas tellement. Sécurité inutile.

@@ -62,81 +55,30 @@ def get(self, id):

r = self._request("anime", {'aid': id})
soup = BeautifulSoup(r.text.encode('utf-8'), 'xml') # http://stackoverflow.com/questions/31126831/beautifulsoup-with-xml-fails-to-parse-full-unicode-strings#comment50430922_31146912
"""with open('backup.xml', 'w') as f:
f.write(r.text)"""
with open('backup.xml', 'w') as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why do you create this file, which purpose does it serve?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was for debug, bien vu!

# 'artists': ? from anime.creators
'nb_episodes': int(anime.episodecount.string),
'anime_type': str(anime.type.string),
'anidb_aid': id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is generally a bad idea to name a variable id, time for riddles: why?

(answer: because it's supposed to be a built-in Python function.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Now please merge!

@codecov-io
Copy link

codecov-io commented Feb 1, 2017

Codecov Report

Merging #248 into master will not impact coverage.

@@           Coverage Diff           @@
##           master     #248   +/-   ##
=======================================
  Coverage   56.51%   56.51%           
=======================================
  Files          13       13           
  Lines         706      706           
=======================================
  Hits          399      399           
  Misses        307      307
Impacted Files Coverage Δ
mangaki/mangaki/models.py 75.33% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a552f19...1ffc793. Read the comment docs.

def handle(self, *args, **options):
anidb = AniDB('mangakihttp', 1)
anime = create_anime(**anidb.get(options.get('id')))
anime.retrieve_poster() # Save for future use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure earlier about the print statements - but maybe printing the anime would make sense here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but should use the proper mechanisms of printing for mgt command in Django, I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrêtez de me faire parler anglais. C'est pédant. Mais bon, ça s'ouvre à plus de gens donc OK.

"""
Allows retrieval of non-file or episode related information for a specific anime by AID (AniDB anime id).
http://wiki.anidb.net/w/HTTP_API_Definition#Anime
"""
id = int(id) # why?
anidb_aid = int(anidb_aid) # why?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why => probably used with a query param, which would be a string?

@jilljenn
Copy link
Member Author

jilljenn commented Feb 1, 2017

Concernant le build qui fail, ça vient de : Couldn't find a tree builder with the features you requested: xml. Do you need to install a parser library?.

C'est parce que lorsqu'on utilise BeautifulSoup en mode décodage de XML, il faut un parseur comme lxml. On l'ajoute aux requirements ou ça vous fâche ?

@RaitoBezarius
Copy link
Member

RaitoBezarius commented Feb 1, 2017

@jilljenn lxml is often slow to install as a dep, IMHO. But, BeautifulSoup supports htmlparser which is built-in AFAIK, and more lenient than lxml, though slower, but we don't care really about speed.

Copy link
Member

@RaitoBezarius RaitoBezarius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary:

  • I really prefer that test case does not interact with the network, e.g. AniDB. (not spamming them each time our CI runs tests will be nice).

  • Minor clean, e.g. unused modules, nitpicking on URL joining, tests behaviors, reusability.

After this, 🚀 !

from mangaki.utils.anidb import AniDB
from mangaki.models import Work, Category
from django.db.models import Count
from urllib.parse import urlparse, parse_qs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_qs is unused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we find an automated thing that does this job?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, theorically, autoflake should do the job.

from django.core.management.base import BaseCommand, CommandError
from mangaki.utils.anidb import AniDB
from mangaki.models import Work, Category
from django.db.models import Count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused.

from mangaki.models import Work, Category
from django.db.models import Count
from urllib.parse import urlparse, parse_qs
import sys
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused.

@@ -0,0 +1,25 @@
from django.core.management.base import BaseCommand, CommandError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CommandError is unused.

if 'anidb_aid' in kwargs:
return Work.objects.update_or_create(category=anime, anidb_aid=kwargs['anidb_aid'], defaults=kwargs)[0]
else:
return Work.objects.create(category=anime, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we don't specify an anidb_aid and we create a duplicate by mistake?
Is there any escape hatch to prevent this behavior which is destroying our DB?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Destroying is not the correct word.
Actually, for this function only, the if is unnecessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Still, that does not answer the question, if anidb_aid is not specified, can we create a duplicate? (due to no constraints on the DB side or the model manager).

"""
Allows retrieval of non-file or episode related information for a specific anime by AID (AniDB anime id).
http://wiki.anidb.net/w/HTTP_API_Definition#Anime
"""
id = int(id) # why?
anidb_aid = int(anidb_aid) # why?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure that the anidb_aid given is a number / integer. Must have been put to interface with strange code which would call this function using string, it's not necessary, we could just assume that anidb_aid is of type int using type hinting.

'picture': "http://img7.anidb.net/pics/anime/" + str(anime.find('picture').string),
}, partial=True, updater=lambda: self.get(anime.id)))

results.append(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears to me this is a good candidate to transform this function into a generator, if animetitles.find_all is also a generator, of course.

Otherwise, I'm wondering if it'd be more optimal to use a dict to store animes by IDs, anyway, this is okay.

def __repr__(self):
return u'<Anime %i "%s">' % (self.id, self.title)
all_titles = anime.titles
# creators = anime.creators
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a TODO marker or FIXME to let someone catch it while running ag 'TODO' mangaki/** when bored.

anime_dict = {
'title': str(all_titles.find('title', attrs={'type': "main"}).string),
'source': 'AniDB: ' + str(anime.url.string) if anime.url else None,
'ext_poster': 'http://img7.anidb.net/pics/anime/' + str(anime.picture.string),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hotlink is hot. Could we use urlparse.urljoin rather than concatenating manually? (just for extra safety!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are talking in 2.7. from urllib.parse import urljoin.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting old. :'(

'source': 'AniDB: ' + str(anime.url.string) if anime.url else None,
'ext_poster': 'http://img7.anidb.net/pics/anime/' + str(anime.picture.string),
# 'nsfw': ?
'date': datetime(*list(map(int, anime.startdate.string.split("-")))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extract this magic method into a utility function which would be put at the top of this file?
So that we can easily reuse this datetime parsing logic for all fields which may require it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember where it comes from but yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I recall, it comes from the original library.

# characters = anime.characters
# ratings = anime.ratings.{permanent, temporary}

print(urljoin('http://img7.anidb.net/pics/anime/', str(anime.picture.string)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debugging prints.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

except:
pass
# str = str
def to_python_datetime(mal_date):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good candidate for doctesting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doctesting, not only docs 😄 ! (except if GitHub didn't reload the code.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it count for codecov? I think it's not that useful, but seems like a good thing to learn.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It counts for codecov (theorically.) !

@RaitoBezarius RaitoBezarius merged commit dccef93 into master Feb 8, 2017
jilljenn added a commit that referenced this pull request Feb 12, 2017
* Rewrite the AniDB utility (together with a test!)

* Remove useless lines

* Rename variables

* Add style

* Remove useless code

* Remove unit tests for AniDB

* Improve style

* Add doctesting
jilljenn pushed a commit that referenced this pull request Feb 12, 2017
* vagrant: update and repair the Vagrantfile

* vagrant: transfer the current user's keys into the machine

* readme: update instructions

* vagrant(bootstrap): install virtualenv as a user rather than globally

* vagrant(size): warning about the size taken by the installation

* Store posters in a FileField (#235)

* Rename Work.poster to Work.ext_poster

* Be less aggressive with local poster handling

 - Do not remove the external poster URL when downloading it locally; it is still a piece of possibly valuable information (especially since it is the only way we currently have to link an anime do MAL!)
 - If there is no locally available poster, fall back to the external URL.

* Make admin action for refreshing posters actually work

* Ignore media/ directory for development environments

* Add FileField for posters

* Clean up poster retrieval

Various fixes for poster retrieval:
 - `retrieve_poster` now uses the requests library and is moved to a method on the Work class. It downloads the known external poster by default.
 - The admin action for updating posters no longer tries to re-download an existing poster onto itself.

* Add management command to bulk download posters

* Make retrieve_poster update the external poster URL

* Update seed data to replace Work.poster with Work.ext_poster

* Merge migrations

* Address PR comments

* Fix path and README (#244)

* Add line to admin and change news (#245)

Update news

* script(bootstrap): remove the comment which got in the ALLOWED_HOSTS array…

* Code coverage is now browsable on the codecov website (#252)

* coverage has to run into the proper folder and nosetests to know about where are the tests

* circle: fix path to manage.py for the coverage run

* Rewrite the AniDB utility (together with a test!) (#248)

* Rewrite the AniDB utility (together with a test!)

* Remove useless lines

* Rename variables

* Add style

* Remove useless code

* Remove unit tests for AniDB

* Improve style

* Add doctesting
jilljenn added a commit that referenced this pull request Feb 14, 2017
* Rewrite the AniDB utility (together with a test!)

* Remove useless lines

* Rename variables

* Add style

* Remove useless code

* Remove unit tests for AniDB

* Improve style

* Add doctesting
jilljenn pushed a commit that referenced this pull request Feb 14, 2017
* vagrant: update and repair the Vagrantfile

* vagrant: transfer the current user's keys into the machine

* readme: update instructions

* vagrant(bootstrap): install virtualenv as a user rather than globally

* vagrant(size): warning about the size taken by the installation

* Store posters in a FileField (#235)

* Rename Work.poster to Work.ext_poster

* Be less aggressive with local poster handling

 - Do not remove the external poster URL when downloading it locally; it is still a piece of possibly valuable information (especially since it is the only way we currently have to link an anime do MAL!)
 - If there is no locally available poster, fall back to the external URL.

* Make admin action for refreshing posters actually work

* Ignore media/ directory for development environments

* Add FileField for posters

* Clean up poster retrieval

Various fixes for poster retrieval:
 - `retrieve_poster` now uses the requests library and is moved to a method on the Work class. It downloads the known external poster by default.
 - The admin action for updating posters no longer tries to re-download an existing poster onto itself.

* Add management command to bulk download posters

* Make retrieve_poster update the external poster URL

* Update seed data to replace Work.poster with Work.ext_poster

* Merge migrations

* Address PR comments

* Fix path and README (#244)

* Add line to admin and change news (#245)

Update news

* script(bootstrap): remove the comment which got in the ALLOWED_HOSTS array…

* Code coverage is now browsable on the codecov website (#252)

* coverage has to run into the proper folder and nosetests to know about where are the tests

* circle: fix path to manage.py for the coverage run

* Rewrite the AniDB utility (together with a test!) (#248)

* Rewrite the AniDB utility (together with a test!)

* Remove useless lines

* Rename variables

* Add style

* Remove useless code

* Remove unit tests for AniDB

* Improve style

* Add doctesting
jilljenn added a commit that referenced this pull request Feb 14, 2017
* This is a commit.

* Upper and latter corrected

* Written tests for mangaki

* Should be okay now.

* Fixed stuff: DRY principle applied. URL to work applied. All correct handlers created.

* Added 500 base error view in case of DatabaseError.

* Looks better that way.

* management_commands: add a sketch of generate seed data command for every purpose

* seed data generation: use the temp database
More verbose output to understand how the process is moving
Fix the argument parsing

* experiment: generate seed data w/o database cloning

* Make tests pass, and actually run them on CircleCI. (#223)

Merging with approval from @RaitoBezarius.

* Do not require discourse settings for avatar initialization (#229)

* Clean up signal handling (#231)

* Clean up signal handling

 - We no longer use a post_save signal for updating scores when a Suggestion is
   changed; rather this has its place directly in the Suggestion.save method.
 - Signals are connected in the ready() method of Mangaki's new application
   configuration class, as recommended by Django's documentation:
   https://docs.djangoproject.com/en/dev/topics/signals/#connecting-receiver-functions
   Note that we use a receivers module instead of a signals module to allow for
   the creation and import of custom signals without registering handlers if
   the need ever arises.
 - Profile creation is no longer tied to login with django-allauth but rather
   to the actual User model creation. This helps handling corner cases such as
   accounts created through the `manage.py createsuperuser` management command
   actually having a profile.

* Update tests for automatic profile creation

* Address PR comments

* circle: get back to the project root folder after tests (#233)

* Add tests for searching works (#232)

* Upgrade Mangaki to Django 1.10 (#234)

* Lint the last commit in the CI (#226)

* Disable git lint until we can configure it properly

* Add some tests ensuring views are not crashing (#237)

* Fix typo (unreviewed)

* train_test_split moved in sklearn 0.18 (#240)

Fixes #239

* Move Mangaki into the root folder (#227)

* The `mangaki` folder content has been moved to the root of the repository.
* After pulling this commit, many files will be reorganized, backup your work tree before pulling.
* Pay attention to your `settings.ini` and path-dependant code, though, they should not be affected by this change.

* unreviewed: fix spacing in README

* Revert "unreviewed: fix spacing in README"

This reverts commit f9848a9.

* Revert "Move Mangaki into the root folder (#227)"

This reverts commit 74d8749.

This breaks mangaki.

* Store posters in a FileField (#235)

* Rename Work.poster to Work.ext_poster

* Be less aggressive with local poster handling

 - Do not remove the external poster URL when downloading it locally; it is still a piece of possibly valuable information (especially since it is the only way we currently have to link an anime do MAL!)
 - If there is no locally available poster, fall back to the external URL.

* Make admin action for refreshing posters actually work

* Ignore media/ directory for development environments

* Add FileField for posters

* Clean up poster retrieval

Various fixes for poster retrieval:
 - `retrieve_poster` now uses the requests library and is moved to a method on the Work class. It downloads the known external poster by default.
 - The admin action for updating posters no longer tries to re-download an existing poster onto itself.

* Add management command to bulk download posters

* Make retrieve_poster update the external poster URL

* Update seed data to replace Work.poster with Work.ext_poster

* Merge migrations

* Address PR comments

* Fix path and README (#244)

* Add line to admin and change news (#245)

Update news

* Code coverage is now browsable on the codecov website (#252)

* coverage has to run into the proper folder and nosetests to know about where are the tests

* circle: fix path to manage.py for the coverage run

* Rewrite the AniDB utility (together with a test!) (#248)

* Rewrite the AniDB utility (together with a test!)

* Remove useless lines

* Rename variables

* Add style

* Remove useless code

* Remove unit tests for AniDB

* Improve style

* Add doctesting

* Update and repair the Vagrantfile (#243)

* vagrant: update and repair the Vagrantfile

* vagrant: transfer the current user's keys into the machine

* readme: update instructions

* vagrant(bootstrap): install virtualenv as a user rather than globally

* vagrant(size): warning about the size taken by the installation

* Store posters in a FileField (#235)

* Rename Work.poster to Work.ext_poster

* Be less aggressive with local poster handling

 - Do not remove the external poster URL when downloading it locally; it is still a piece of possibly valuable information (especially since it is the only way we currently have to link an anime do MAL!)
 - If there is no locally available poster, fall back to the external URL.

* Make admin action for refreshing posters actually work

* Ignore media/ directory for development environments

* Add FileField for posters

* Clean up poster retrieval

Various fixes for poster retrieval:
 - `retrieve_poster` now uses the requests library and is moved to a method on the Work class. It downloads the known external poster by default.
 - The admin action for updating posters no longer tries to re-download an existing poster onto itself.

* Add management command to bulk download posters

* Make retrieve_poster update the external poster URL

* Update seed data to replace Work.poster with Work.ext_poster

* Merge migrations

* Address PR comments

* Fix path and README (#244)

* Add line to admin and change news (#245)

Update news

* script(bootstrap): remove the comment which got in the ALLOWED_HOSTS array…

* Code coverage is now browsable on the codecov website (#252)

* coverage has to run into the proper folder and nosetests to know about where are the tests

* circle: fix path to manage.py for the coverage run

* Rewrite the AniDB utility (together with a test!) (#248)

* Rewrite the AniDB utility (together with a test!)

* Remove useless lines

* Rename variables

* Add style

* Remove useless code

* Remove unit tests for AniDB

* Improve style

* Add doctesting

* Add new WALS algorithm from TensorFlow (#246)

* Add new WALS algorithm from TensorFlow

* Add WALS file

* Improve style

* Minor cleanup around the codebase (#253)

* Fix syntax error in `reco_list.html`

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>

* Mutable arguments are dangerous

Default to None, if it's none, replace them by empty arrays.

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>

* {decode,encode}string are deprecated
It's {decode,encode}bytes now.

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>

* Reference local variable `now` properly before the loop

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>

* Time to import time for the `retrieveposters` mgt command

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>

* Reference the `nb_ratings` variable in the good scope.

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>

* Remove unused imports from `zero.py`

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>

* Remove unused imports

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>

* Remove unused imports
(import missing models for knn.py also)

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>

* Import missing modules for NMF
(otherwise, I don't see how it was working…)

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>

* Requirements refactoring into folders
Add matplotlib as requirement

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>

* Remove more unused imports and add used imports

* requirements: add production

* cleanup: edit the README about requirements, remove old README in mangaki/

* readme: typo

* hotfix: s/requirements-dev.txt/requirements/dev.txt (#258)

* Ansible deployment for production (#171)

* Rough ansible provisioning

* ansible(roles): add initial roles

* ansible(templates): add settings

* ansible(vars): add mangaki

* ansible(playbook): add the playbook

* ansible(gitignore): ignore the hosts inventory and secrets vars

* ansible(readme): explain about secrets

* WIP 2

* Ansible: WIPv3

All remaining is unattended upgrades and email backend configuration.

— Make Let's Encrypt renewal and setup works.
— Add timers for ranking / top director.
— Run Mangaki app server with supervisord.
— Install lxml / numpy package to speed the pip install.

* ansible: remove old mangaki.yml vars

* ansible(email): add external smtp server support

* Address PR comments

* ansible: Refactor LE and NGINX into a same role

* ansible: improve apt cache management and remove useless steps

* ansible(letsencrypt): Force-restart NGINX after its installation, pin dehydrated's version

* ansible(readme): make it more useful

* ansible(nginx): remove escape character ' from main.yml

* Remove useless notebooks (still ongoing) (#257)

* Remove useless notebooks (still ongoing)

* Cleanup notebooks

* Notebook sur des exemples et tests de svd et dpp (#195)

* création du premier notebook

* Début du notebook sur le graphique

* modifications mineures deu notebook, tests persos

* chngments mineurs : mise en forme

* continuation du notebook, recopiage du début d'une fonction compute_similarity_cosine

* suppression de vieux fichiers

* suite du notebook, 1ère fonction pour avoir une matrice de similarité

* ajout d'une fonction coisine bis moins calculatoire, début des tests de la DPP

* avancement du notebook (dpp, distance, comparaison), un peu (bcp) fouilli, ne pas lire

* notebook _notebook stage_ : meilleure implémentation de cosine, début de jaccard, notebook _essai_ : essais, début dimaètre d'ordre r

* un plus de tests, celui proposé par jj dans notebook _notebook stage_ et _essai, test_, début de la classe (juste une ébauche), création de matrice creuse pour utiliser directement ratings.csv et ne pas être obligé de faire tourner svd tout le temps

* classe pour la dpp dans dpp.py et nouveau notebook _DPP_ pour tester

* Add file for requirements for algorithms

* Update notebooks with annotations

* Modification de dpp.py

Ajouts et modifications de classes et de fonctions nécessaires à la vérification, au test (comparaison) et à l'implémentation liés au dpp (determinantal point process)

* Modifications de dpp.py

Changements mineurs, ajout de jaccard (fonction toute faite), code plus lisible avec moins d'erreurs
Utlisation directe de la BDD non encore faite mais bientôt en cours

* Modifications de dpp.py et ajouts de notebooks brouillons

Récupération des éléments depuis la base de données et non plus depuis ratings.csv
Notebooks brouillons "Test classe" et "Test classe-Copy1" où des tests ont été faits.
On peut y voir qu'un gros problème subsiste : des éléments sont ou deviennent des "nan" lors de certaines opérations

* Modification de la future classe SimilarityMatrix de dpp.py

Aucun changement notoire dans dpp.py
Création de "test de Similarity[...]" pour créer, tester et modifier la classe Similarity. En cours : création de la matrice liée aux données brutes, en limitant les appels et les users et/ou les oeuvres n'ayant aucun rating

* nouveau

* Modifications de dpp.py

Continuation et presque finition de dpp.py : la classe SimilarityMatrix a été en grande partie refaite.
 Les classes suivantes, à savoir MangakiUniform et MangakiDPP, et la fonction compare ont été modifiées en conséquence

* Modifications de dpp.py et modification  du notebook de test "test de similarity_matrix qui sera ds dpp.py.ipynb

Suppression d'une fonction inutile dans dpp.py
Modification du notebook : un test de dpp.py est fait vers la fin (voir le gros titre/heading "Test "final" de dpp.py"

* Modifications de dpp.py

Modifications suite aux remarques d'elarnon, dont principalement :
-nouveau constructeur de SimilarityMatrix
-code plus proche des recommandations PEP8 (utilisation de flake8 en vérification)
-utilisation d'une matrice creuse et non plus d'une matrice qu'on remplit de zéros
-utilisation de la fonction np.random.choice à la place de random.shuffle

Il reste à créer la fonction compare voulue par elarnon notée compare2 en attendant

* Modifications de dpp.py

Modifications surtout de compare2 (qui remplacera compare) :
-changement du diamètre d'ordre 1 pour le diamètre d'ordre 0
-changement du synospsis de la fonction, des arguments
A faire : la tester, dans tous les cas

* Save

* Checkpoint

* Dernières modifications dees fichiers liés aux algorithmes liés à la dpp :
-dpp.py
-buildmatrix.py qui contient une classe construisant principalement une matrice des ratings( users en ligne, works/items en colonne, ratings dans les cellules) à partir de la base de données ou d'un fichier csv

Modification  faire  si buildmatrix.py est gardé, enlever la classe BuildMatrix dans dpp.py

* Add pandas to requirements

* Remove useless file

* Determinantal Point Processes (#201)

* implement dpp in mangaki

* Dernière version de dpp.py et buildmatrix.py

A l'air de bien marcher et vérifie la PEP8

* Modification de dpp.py

Changements des noms de variable dans la fonction diameter_0 (variables sans accent et en anglais

* Modification de buildmatrix.py

Suppression d'une erreur d'inattention : un "rating" s'était incrusté à la place d'un "choice" ...

* Intégration de dpp au site

Commencement

* avancement

* Pas grand chose de nouveau : il faudrait prendre en conséquence le fait que l'on peut choisir d'avoir que des mangas ou que des animes avec la dpp

* Avancement de l'intégration de dpp

* problème d'url et vue pr savoir si l'on doit avoir le mode dpp ou pas

* quelques changements

* quelques changements

* anciens fichiers

* essai

* Intégration de dpp au site, avec les recommandations cette fois

Améliorable (fonctions presque "doubles" se ressemblant pr la version sans dpp et avec dpp

* petites modifications mais non encore fini

* master

* rectifications en cours

* changements

* encore des rectifications (non fini encore)

* pb : affichage anime/mangas seulement ne marche pas

* rectifications

* oublis

* suppression d'une migration inutile (déjà présente ds master en fait)

* retrait des recommandations pour dpp pour un utilisateur lambda

* début rectificatifs

* Rectifications

PB avec la popularité normalement réglé (mais pas vraiment testé car j'ai 20 oeuvres en tout, c'est tout ^^). Si testé depuis une grosse seed :: augmenter le nombre d'oeuvres prises en compte ds popular (dpp ds models.py) et le nbre de points du sample de dpp (dans views.py)
PB url avec les sort et les keywords dpp encore à faire

* améliorations

* essais

* changements, encore des trucs à faires (les urls et voir dernières remarques elarnon)

* amélioration des urls

* L'histoire d'url pr dpp est réglé
Modification des titres "DPP" en "Découvrir"

* dernières (?) rectifications suite aux messages d'elarnon sauf erreur. Il faut encore cleaner le code, vérifier si PEP8 est bien respectée

* PEP8 mieux respectée

* rectifications

* dernière version master

* Changement des dépendances des migrations pour que ça marche (management command marche et la loaddata de la seed aussi (mais pas la big seed :/))

* Suppression d'un notebook inutile et n'ayant rien à faire là

* rectifications, encore à vérifier en testant sur la version dev de mangaki

* Rectifications suite aux remarques de raito

* Variables temporaires supprimées dans ratingsmatrix.py car inutiles

* typographie

* new message error more accurate

* Remove heavy files, add migrations

* Fix tests

* Fix test

* Clean code

* Add migration for tropes

* Add request to server_error, remove handler500 from urls
@RaitoBezarius RaitoBezarius deleted the jj/fix-anidb branch April 30, 2017 13:07
@RaitoBezarius RaitoBezarius restored the jj/fix-anidb branch April 30, 2017 13:07
@RaitoBezarius RaitoBezarius deleted the jj/fix-anidb branch October 29, 2017 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants