Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include dataset and class descriptions in CSV for import/export - AB-387 #333

Merged
merged 5 commits into from Jun 6, 2019

Conversation

@aidanlw17
Copy link
Contributor

@aidanlw17 aidanlw17 commented Mar 30, 2019

Import/Export csv w/ description support

This pull request addresses the issues noted in ticket AB-387
https://tickets.metabrainz.org/projects/AB/issues/AB-387?filter=allopenissues

Import:
  • alter _parse_dataset_csv(file) in webserver/views/datasets.py to
    handle dataset and class descriptions on any line of csv file
  • csv line format for dataset description:
    description,description of dataset dataset
  • csv line format for class description:
    description:class1,description of class 1
  • _parse_dataset_csv(file) returns [dataset_description, classes]
    where dataset_description is description of entire dataset and classes
    is a list of all class dictionaries with structure:
    {'recordings': [mbid1,...,mbidn], 
    'description': 'class description', 
    'name': 'class_name'}
  • small change to db/dataset.py function
    create_from_dict(dictionary,author_id) to support class and dataset
    dictionaries containing a description
Export:
  • in webserver/views/datasets.py function
    _convert_dataset_to_csv_stringio(dataset) adds dataset description and
    class descriptions to top of csv if they exist
  • format:
    description,dataset description
    description:class1,description of class one 
    description:class2,description of class two 
    ... 
    mbid,classname
    ... 

Create unit test test_parse_dataset_csv in `webserver/views/test:

  • add directory webserver/views/test_data with file test_db.csv
  • add test_data path to webserver/testing.py

Altered unit test for test_dataset_to_csv:

  • add support for class and dataset descriptions
@aidanlw17
Copy link
Contributor Author

@aidanlw17 aidanlw17 commented Mar 30, 2019

@alastair This PR fixes what we discussed in PR #332 by isolating changes related to ticket AB-387. Note that I closed PR #332 as well. Apologies for the delay on this - I was buried in midterms this week! I changed the if clauses as you suggested. Let me know if you have any other feedback. Thanks!

Loading

Copy link
Contributor

@alastair alastair left a comment

Thanks for the updated patch. I've looked through it and added some more changes of my own. Please take a look at the changes that I made and see if you understand why I made them. Feel free to ask me if you have any questions about why I changed something or did something in a particular way.

Loading


else:
return render_template("datasets/import.html", form=form)


def _parse_dataset_csv(file):
classes_dict = defaultdict(list)
"""Parse a csv file containing a representation of a dataset.
Copy link
Contributor

@alastair alastair Apr 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this docstring - while it wasn't a particular part of this PR, I saw when I was editing this method that it had some complexity and no documentation, so I also updated this

Loading

Copy link
Contributor Author

@aidanlw17 aidanlw17 Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. In future contributions if I see complexity building up I will add a docstring to document!

Loading

{"name": class name, "description": class description, "recordings": []}
a class is only returned if there are recordings for it. A class
"""
classes_dict = defaultdict(lambda: {"description": None, "recordings": []})
Copy link
Contributor

@alastair alastair Apr 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the definition of this defaultdict so that we didn't have to use setdefault. Both methods work, but I think this is a bit clearer.
A small thing to note in your use of setdefault: because the original definition of this was classes_dict = defaultdict(list), when you used setdefault to set it to a dictionary of dictionary, my type checker complained that the parameter was a dictionary instead of a list. that is

classes_dict = defaultdict(list)
classes_dict['class'] = {}
classes_dict['class']['recordings'] = ...

is "wrong" because the types of list/dict don't match

Loading

Copy link
Contributor Author

@aidanlw17 aidanlw17 Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. That's definitely a cleaner, simplified method and I recognize the type error associated with that use of setdefault.

Loading

return classes

for name, class_data in six.iteritems(classes_dict):
if class_data["recordings"]:
Copy link
Contributor

@alastair alastair Apr 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this check here because the dataset validator when you upload a new dataset requires at least two recordings in each class. If someone had created a class with a description and no recordings then the upload would have failed

Loading

Copy link
Contributor Author

@aidanlw17 aidanlw17 Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I hadn't properly considered there being a minimum on the number of recordings. I have a couple of follow-up questions. With the minimum being two recordings, should we check for class_data["recordings"] to contain at least two recordings, rather than just checking that it isn't empty? Additionally, since we have this check, need we check below in classes.append for "recordings" in class_data?

Loading

Copy link
Contributor

@alastair alastair Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't need this check because during the upload process we call db.dataset.create_from_dict, which runs dataset validation first:

dataset_validator.validate(dictionary)

This will show an error on the upload page if the file doesn't have enough recordings

Loading

Copy link
Contributor Author

@aidanlw17 aidanlw17 Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. Thanks, again!

Loading

self.assertEqual(dataset_csv, expected)

@mock.patch("db.dataset.create_from_dict")
def test_import_csv_invalid_form(self, mock_create_from_dict):
Copy link
Contributor

@alastair alastair Apr 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had no tests for the import csv page, so I added some here because we're working in the same part of the app

Loading

Copy link
Contributor Author

@aidanlw17 aidanlw17 Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, the methods in these tests make sense to me.

Loading

"public": True,
}
try:
dataset_id = db.dataset.create_from_dict(dataset_dict, current_user.id)
except dataset_validator.ValidationException as e:
raise BadRequest(str(e))
flash.info("Dataset has been imported successfully.")
return redirect(url_for(".view", id=dataset_id))
return redirect(url_for(".view", dataset_id=dataset_id))
Copy link
Contributor

@alastair alastair Apr 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a bug here which caused this page to not load... You probably should have encountered this if you tested this import page. Did you see the issue?

Loading

Copy link
Contributor Author

@aidanlw17 aidanlw17 Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I did see that issue, I should have brought it up - apologies. One thing that I noted, this issue only happens when importing CSV files on localhost - importing CSV on acousticbrainz.org does not experience the redirect error. Do you know why this is?

Loading

Copy link
Contributor

@alastair alastair Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've not released the change that introduced this bug yet, which is why it's not present on acousticbrainz.org. If you see a problem like this and you're not sure if it's a problem or not, you can always ask us about it, in IRC, or a ticket comment, or open a new ticket.

Loading

Copy link
Contributor Author

@aidanlw17 aidanlw17 Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay makes sense, thanks for clearing that up! If I'm confused about a problem like that I'll be sure to ask about it along the way.

Loading

@alastair
Copy link
Contributor

@alastair alastair commented Apr 16, 2019

@paramsingh a review here on my changes would be great, thanks

Loading

@alastair
Copy link
Contributor

@alastair alastair commented Apr 16, 2019

I'm looking at the failing tests - this is weird. I'm not seeing the problem on my machine (even after rebuilding the base image).
It looks like it's to do with the version of werkzeug. This was changed recently:
pallets/werkzeug@ab6150f#diff-28c7ae5b202ea91826688f8703013044L301
We should look at pinning the version of werkzeug in brainzutils or ab to ensure that we get reproducible results

Loading

<pre>description,A dataset of genres<br>description:rock,Songs that fall into the rock genre<br>description:pop,The subset of rock containing 'popular' music</pre>
</figure>
Description lines can go anywhere in the file, but we recommend that you add them at the top.
Any class with a description but will no recordings will not be imported.
Copy link
Contributor Author

@aidanlw17 aidanlw17 Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small language issue on line 27 - should it read "but with no recordings" rather than "but will no recordings"?

Loading

paramsingh added a commit to paramsingh/brainzutils-python that referenced this issue Apr 25, 2019
aidanlw17 and others added 5 commits Jun 6, 2019
This addresses the issues noted in ticket AB-387

Import:
- alter _parse_dataset_csv(file) in webserver/views/datasets.py to
  handle dataset and class descriptions on any line of csv file
- csv line format for dataset description:
      description,description of dataset
- csv line format for class description:
            description:class1,description of class 1
- _parse_dataset_csv(file) returns [dataset_description, classes]
  where dataset_description is description of entire dataset and
  classes is a list of all class dictionaries with structure:
    {'recordings': [mbid1,...,mbidn]
    'description': 'class description',
    'name': 'class_name'}
- small change to db/dataset.py function
  create_from_dict(dictionary,author_id) to support class and dataset
  dictionaries containing a description

Export:
- in webserver/views/datasets.py function
  _convert_dataset_to_csv_stringio(dataset) adds dataset description and
  class descriptions to top of csv if they exist
- format:
    description,dataset description
    description:class1,description of class one
    description:class2,description of class two
    ...
    mbid,classname
    ...

Create unit test test_parse_dataset_csv:
- add test_data directory in webserver/views with file test_db.csv
- add test_data path to webserver/testing.py

Altered unit test for test_dataset_to_csv:
- add support for class and dataset descriptions
This commit fixes the changes to if clauses in import_csv()
and _convert_dataset_to_csv_stringio(dataset) as requested
by in previous review of the solution to AB-387.
- Additional documentation on the import csv page
- Tests to check the validity of the import csv form
- Disable CSRF checking in unit tests
- More docstrings for data parsing methods
- Don't create a class which only has a description and no recordings
Parameter name to view was changed but call to url_for wasn't updated
@alastair alastair merged commit cc359f8 into metabrainz:master Jun 6, 2019
1 check passed
Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants