Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include dataset and class descriptions in CSV for import/export - AB-387 #333

Merged
merged 5 commits into from Jun 6, 2019

Conversation

@aidanlw17
Copy link
Contributor

commented Mar 30, 2019

Import/Export csv w/ description support

This pull request addresses the issues noted in ticket AB-387
https://tickets.metabrainz.org/projects/AB/issues/AB-387?filter=allopenissues

Import:
  • alter _parse_dataset_csv(file) in webserver/views/datasets.py to
    handle dataset and class descriptions on any line of csv file
  • csv line format for dataset description:
    description,description of dataset dataset
  • csv line format for class description:
    description:class1,description of class 1
  • _parse_dataset_csv(file) returns [dataset_description, classes]
    where dataset_description is description of entire dataset and classes
    is a list of all class dictionaries with structure:
    {'recordings': [mbid1,...,mbidn], 
    'description': 'class description', 
    'name': 'class_name'}
  • small change to db/dataset.py function
    create_from_dict(dictionary,author_id) to support class and dataset
    dictionaries containing a description
Export:
  • in webserver/views/datasets.py function
    _convert_dataset_to_csv_stringio(dataset) adds dataset description and
    class descriptions to top of csv if they exist
  • format:
    description,dataset description
    description:class1,description of class one 
    description:class2,description of class two 
    ... 
    mbid,classname
    ... 

Create unit test test_parse_dataset_csv in `webserver/views/test:

  • add directory webserver/views/test_data with file test_db.csv
  • add test_data path to webserver/testing.py

Altered unit test for test_dataset_to_csv:

  • add support for class and dataset descriptions
@aidanlw17

This comment has been minimized.

Copy link
Contributor Author

commented Mar 30, 2019

@alastair This PR fixes what we discussed in PR #332 by isolating changes related to ticket AB-387. Note that I closed PR #332 as well. Apologies for the delay on this - I was buried in midterms this week! I changed the if clauses as you suggested. Let me know if you have any other feedback. Thanks!

@alastair
Copy link
Contributor

left a comment

Thanks for the updated patch. I've looked through it and added some more changes of my own. Please take a look at the changes that I made and see if you understand why I made them. Feel free to ask me if you have any questions about why I changed something or did something in a particular way.


else:
return render_template("datasets/import.html", form=form)


def _parse_dataset_csv(file):
classes_dict = defaultdict(list)
"""Parse a csv file containing a representation of a dataset.

This comment has been minimized.

Copy link
@alastair

alastair Apr 16, 2019

Contributor

I added this docstring - while it wasn't a particular part of this PR, I saw when I was editing this method that it had some complexity and no documentation, so I also updated this

This comment has been minimized.

Copy link
@aidanlw17

aidanlw17 Apr 17, 2019

Author Contributor

Makes sense. In future contributions if I see complexity building up I will add a docstring to document!

{"name": class name, "description": class description, "recordings": []}
a class is only returned if there are recordings for it. A class
"""
classes_dict = defaultdict(lambda: {"description": None, "recordings": []})

This comment has been minimized.

Copy link
@alastair

alastair Apr 16, 2019

Contributor

I changed the definition of this defaultdict so that we didn't have to use setdefault. Both methods work, but I think this is a bit clearer.
A small thing to note in your use of setdefault: because the original definition of this was classes_dict = defaultdict(list), when you used setdefault to set it to a dictionary of dictionary, my type checker complained that the parameter was a dictionary instead of a list. that is

classes_dict = defaultdict(list)
classes_dict['class'] = {}
classes_dict['class']['recordings'] = ...

is "wrong" because the types of list/dict don't match

This comment has been minimized.

Copy link
@aidanlw17

aidanlw17 Apr 17, 2019

Author Contributor

Noted. That's definitely a cleaner, simplified method and I recognize the type error associated with that use of setdefault.

return classes

for name, class_data in six.iteritems(classes_dict):
if class_data["recordings"]:

This comment has been minimized.

Copy link
@alastair

alastair Apr 16, 2019

Contributor

I added this check here because the dataset validator when you upload a new dataset requires at least two recordings in each class. If someone had created a class with a description and no recordings then the upload would have failed

This comment has been minimized.

Copy link
@aidanlw17

aidanlw17 Apr 17, 2019

Author Contributor

Thanks, I hadn't properly considered there being a minimum on the number of recordings. I have a couple of follow-up questions. With the minimum being two recordings, should we check for class_data["recordings"] to contain at least two recordings, rather than just checking that it isn't empty? Additionally, since we have this check, need we check below in classes.append for "recordings" in class_data?

This comment has been minimized.

Copy link
@alastair

alastair Apr 17, 2019

Contributor

No, we don't need this check because during the upload process we call db.dataset.create_from_dict, which runs dataset validation first:

dataset_validator.validate(dictionary)

This will show an error on the upload page if the file doesn't have enough recordings

This comment has been minimized.

Copy link
@aidanlw17

aidanlw17 Apr 17, 2019

Author Contributor

Understood. Thanks, again!

self.assertEqual(dataset_csv, expected)

@mock.patch("db.dataset.create_from_dict")
def test_import_csv_invalid_form(self, mock_create_from_dict):

This comment has been minimized.

Copy link
@alastair

alastair Apr 16, 2019

Contributor

We had no tests for the import csv page, so I added some here because we're working in the same part of the app

This comment has been minimized.

Copy link
@aidanlw17

aidanlw17 Apr 17, 2019

Author Contributor

Noted, the methods in these tests make sense to me.

"public": True,
}
try:
dataset_id = db.dataset.create_from_dict(dataset_dict, current_user.id)
except dataset_validator.ValidationException as e:
raise BadRequest(str(e))
flash.info("Dataset has been imported successfully.")
return redirect(url_for(".view", id=dataset_id))
return redirect(url_for(".view", dataset_id=dataset_id))

This comment has been minimized.

Copy link
@alastair

alastair Apr 16, 2019

Contributor

There was a bug here which caused this page to not load... You probably should have encountered this if you tested this import page. Did you see the issue?

This comment has been minimized.

Copy link
@aidanlw17

aidanlw17 Apr 17, 2019

Author Contributor

Yes I did see that issue, I should have brought it up - apologies. One thing that I noted, this issue only happens when importing CSV files on localhost - importing CSV on acousticbrainz.org does not experience the redirect error. Do you know why this is?

This comment has been minimized.

Copy link
@alastair

alastair Apr 17, 2019

Contributor

We've not released the change that introduced this bug yet, which is why it's not present on acousticbrainz.org. If you see a problem like this and you're not sure if it's a problem or not, you can always ask us about it, in IRC, or a ticket comment, or open a new ticket.

This comment has been minimized.

Copy link
@aidanlw17

aidanlw17 Apr 17, 2019

Author Contributor

Okay makes sense, thanks for clearing that up! If I'm confused about a problem like that I'll be sure to ask about it along the way.

@alastair

This comment has been minimized.

Copy link
Contributor

commented Apr 16, 2019

@paramsingh a review here on my changes would be great, thanks

@alastair

This comment has been minimized.

Copy link
Contributor

commented Apr 16, 2019

I'm looking at the failing tests - this is weird. I'm not seeing the problem on my machine (even after rebuilding the base image).
It looks like it's to do with the version of werkzeug. This was changed recently:
pallets/werkzeug@ab6150f#diff-28c7ae5b202ea91826688f8703013044L301
We should look at pinning the version of werkzeug in brainzutils or ab to ensure that we get reproducible results

<pre>description,A dataset of genres<br>description:rock,Songs that fall into the rock genre<br>description:pop,The subset of rock containing 'popular' music</pre>
</figure>
Description lines can go anywhere in the file, but we recommend that you add them at the top.
Any class with a description but will no recordings will not be imported.

This comment has been minimized.

Copy link
@aidanlw17

aidanlw17 Apr 17, 2019

Author Contributor

Small language issue on line 27 - should it read "but with no recordings" rather than "but will no recordings"?

paramsingh added a commit to paramsingh/brainzutils-python that referenced this pull request Apr 25, 2019
aidanlw17 and others added 5 commits Mar 30, 2019
Import/Export csv w/ description support - AB-387
This addresses the issues noted in ticket AB-387

Import:
- alter _parse_dataset_csv(file) in webserver/views/datasets.py to
  handle dataset and class descriptions on any line of csv file
- csv line format for dataset description:
      description,description of dataset
- csv line format for class description:
            description:class1,description of class 1
- _parse_dataset_csv(file) returns [dataset_description, classes]
  where dataset_description is description of entire dataset and
  classes is a list of all class dictionaries with structure:
    {'recordings': [mbid1,...,mbidn]
    'description': 'class description',
    'name': 'class_name'}
- small change to db/dataset.py function
  create_from_dict(dictionary,author_id) to support class and dataset
  dictionaries containing a description

Export:
- in webserver/views/datasets.py function
  _convert_dataset_to_csv_stringio(dataset) adds dataset description and
  class descriptions to top of csv if they exist
- format:
    description,dataset description
    description:class1,description of class one
    description:class2,description of class two
    ...
    mbid,classname
    ...

Create unit test test_parse_dataset_csv:
- add test_data directory in webserver/views with file test_db.csv
- add test_data path to webserver/testing.py

Altered unit test for test_dataset_to_csv:
- add support for class and dataset descriptions
Fix to if clauses in webserver/views/datasets.py
This commit fixes the changes to if clauses in import_csv()
and _convert_dataset_to_csv_stringio(dataset) as requested
by in previous review of the solution to AB-387.
Add documentation and tests for import csv view
- Additional documentation on the import csv page
- Tests to check the validity of the import csv form
- Disable CSRF checking in unit tests
- More docstrings for data parsing methods
- Don't create a class which only has a description and no recordings
Fix redirect to dataset view broken in 06e8968
Parameter name to view was changed but call to url_for wasn't updated

@alastair alastair force-pushed the aidanlw17:ab-387 branch from 4c8e5d4 to cc359f8 Jun 6, 2019

@alastair alastair merged commit cc359f8 into metabrainz:master Jun 6, 2019

1 check passed

Jenkins Build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.