Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pop2Piano #21785

Merged
merged 78 commits into from
Aug 21, 2023
Merged

Add Pop2Piano #21785

merged 78 commits into from
Aug 21, 2023

Conversation

susnato
Copy link
Contributor

@susnato susnato commented Feb 24, 2023

What does this PR do?

Adds Pop2Piano model to HuggingFace.

Fixes #20126

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker

@susnato susnato changed the title Add Pop2Piano [WIP] Add Pop2Piano Feb 24, 2023
@susnato
Copy link
Contributor Author

susnato commented Mar 13, 2023

Hi @ArthurZucker the implementation is almost ready(also tests) but I feel that the way I implemented this model is not a descent level, thats why I want you to take a look at the Pop2PianoModel structure.
Just to be clear with the feature extractor, the Pop2PianoFeatureExtractor takes raw_audio as input and generates variable length output(10, 50000, 15, 62200), even if I pad the raw_audio at start, it will still produce different results for different audio files, so I used lists to stack them and then wrapped them through BatchFeature.

Please don't mind about docs I will change them afterwards

EDIT : Please ignore this

@susnato susnato force-pushed the pop2piano branch 2 times, most recently from d6f6f7e to e2cbd03 Compare March 18, 2023 18:07
@sweetcocoa
Copy link
Contributor

(Here is the author of pop2piano)
Thank you for doing this PR. It seems that this was implemented by understanding the original code better than me! Please feel free to ask me if there is anything I can check or do.

@susnato susnato force-pushed the pop2piano branch 2 times, most recently from 30b4d9d to 5c81213 Compare March 19, 2023 14:57
@susnato susnato changed the title [WIP] Add Pop2Piano Add Pop2Piano Mar 19, 2023
@susnato susnato marked this pull request as ready for review March 19, 2023 14:59
@susnato
Copy link
Contributor Author

susnato commented Mar 19, 2023

@sweetcocoa Thanks for you comments, HF team has helped me a lot in this integration.

@ArthurZucker
Copy link
Collaborator

For solving the import issues, you have to create a require_xxx with the name of the package. Look for example at the require_accelerate in the testing_utils.py! 😉

@susnato
Copy link
Contributor Author

susnato commented Mar 20, 2023

Hi @ArthurZucker thanks for you comment!
But I have already created require_xxx in testing_utils.py regarding essentia and pretty_midi and also I have used them in transformers/src/transformers/models/pop2piano/__init__.py.

src/transformers/__init__.py Outdated Show resolved Hide resolved
src/transformers/__init__.py Show resolved Hide resolved
src/transformers/models/pop2piano/__init__.py Outdated Show resolved Hide resolved
src/transformers/models/pop2piano/__init__.py Show resolved Hide resolved
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@susnato susnato force-pushed the pop2piano branch 2 times, most recently from b9357cd to 5dd2d02 Compare March 23, 2023 11:13
@susnato
Copy link
Contributor Author

susnato commented Mar 23, 2023

Hi @ArthurZucker sorry for the delay but the tests are green now! please review it.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work already! Left a few comments, mostly let's remove as many dependencies as we can + remove one-liners. Congrats on getting matching logits btw 🔥

docs/source/en/model_doc/pop2piano.mdx Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
src/transformers/models/pop2piano/modeling_pop2piano.py Outdated Show resolved Hide resolved
src/transformers/models/pop2piano/modeling_pop2piano.py Outdated Show resolved Hide resolved
src/transformers/models/pop2piano/modeling_pop2piano.py Outdated Show resolved Hide resolved
tests/models/pop2piano/test_tokenization_pop2piano.py Outdated Show resolved Hide resolved
tests/models/pop2piano/test_tokenization_pop2piano.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work already! Left a few comments, mostly let's remove as many dependencies as we can + remove one-liners. Congrats on getting matching logits btw 🔥

@susnato
Copy link
Contributor Author

susnato commented Apr 10, 2023

Hi @ArthurZucker , sorry for the huge delay, I have made most of the changes that you asked. Also there are some changes that I didn't do these are below -

  1. I managed to remove dependency on soundfile and torchaudio but not librosa, since raw_audio is used 2 times first time in extract_rhythm which takes audio with original sampling_rate and the second time in single_preprocess which first upscales/downscales raw_audio to sampling_rate of 22050 and then uses it. And preloading raw_audio with sampling_rate of 22050(not with native sampling_rate) was giving very bad results! I tried to use scipy.resample but since it uses fft it is relatively slow and less accurate.
  2. As you suggested to pad the feature_extractor outputs with silence, I tried to do that but I found that different audio files with same length have input_features of different shapes! For example one was having shape of [7, 38, 512] and another one of [6, 42, 512], both were 10s audios. I could pad them and use them in a batch but then I need to keep track of their shapes which would introduce another variable, what do you suggest? Should I leave them or try to pad and keep track of shapes?

The licensing page of essentia says - "Essentia is available under an open license, Affero GPLv3, for non-commercial applications, thus it is possible to test the library before deciding to licence it under a comercial licence." I don't know much about licensing so I will leave it upto you to decide what to change in the headings.

Also please forgive me if I missed something, I will change them in the future commit.

@ArthurZucker
Copy link
Collaborator

Hey! Will try to have a look asap

Copy link

@harshsingh32 harshsingh32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine

@ArthurZucker
Copy link
Collaborator

Pinging @sanchit-gandhi for a review too!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already very clean! Thanks for all you efforts 🔥 will let @sanchit-gandhi do a review too!

docs/source/en/model_doc/pop2piano.mdx Outdated Show resolved Hide resolved
docs/source/en/model_doc/pop2piano.mdx Outdated Show resolved Hide resolved
src/transformers/models/pop2piano/modeling_pop2piano.py Outdated Show resolved Hide resolved
src/transformers/models/pop2piano/modeling_pop2piano.py Outdated Show resolved Hide resolved
src/transformers/models/pop2piano/modeling_pop2piano.py Outdated Show resolved Hide resolved
@susnato
Copy link
Contributor Author

susnato commented Aug 3, 2023

Hi @ArthurZucker I have pushed the changes you requested except this one(I have asked for some help about that on the respective thread).

BTW the failed circle-ci test is unrelated to this PR.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a lot better thanks!

@susnato
Copy link
Contributor Author

susnato commented Aug 8, 2023

Hi @ArthurZucker I have added the convert_pop2piano_weights_to_hf.py file, added a single vocab and removed the complex logic of _convert_token_to_id and _convert_id_to_token. Also we don't need those extra properties anymore.

Please let me know if any more changes are needed to the tokenizer or not.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot better! Thanks for the simplification

@ArthurZucker
Copy link
Collaborator

cc @sanchit-gandhi for a final review!

Copy link
Contributor

@sanchit-gandhi sanchit-gandhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the huge effort on iterating here @susnato. In particular, the tokenizer code is pretty tricky, where you have to manage different array types and logic for post-processing. But overall it's looking very clean, so nice job!

Overall, I think the high-level API looks very clean and in-keeping with the other transformers audio models. The complexity is buried mainly in the feature extractor and tokenizer, which IMO is fine given how difficult the pre- and post-processing is for this model

Just a few minor suggestions regarding the feature extractor - otherwise I very much agree with the recent round of changes suggested by @ArthurZucker for the tokenizer!

docs/source/en/model_doc/pop2piano.md Outdated Show resolved Hide resolved
docs/source/en/model_doc/pop2piano.md Outdated Show resolved Hide resolved

return mel_specs

def log_mel_spectrogram(self, sequence: np.ndarray):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually feel like this could have been a single function that does the mel spectrogram and takes the logarithm - I personally think thin wrappers around functions hurts readability of code, and is somewhat against the transformers design where we avoid writing multiple small functions in place of a single larger one. But happy to keep it split into two if that's what @amyeroberts prefers

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe that's part of the design philosophy of transformers! It's true that in some case, particularly forward passes, we like the logic to be more verbose and contained within the method such that the model logic is clear. Generally though, functions should be small and only do one thing. As the comments highlights, readability and clarity are unfortunately subjective :)

The reason I asked was because in code we should write fn do_a and fn do_b and then compose with b(a(x)) instead of fn do_a_and_b. As @sanchit-gandhi mentions, log_mel_spectrogram is effectively a small wrapper, and if I were to implement, I probably would just have mel_spectrogram, and do np.log(x) in the __call__ method. Or have mel_spectrogram as a module level function that the log_mel_spectrogram method calls.

I don't think it matters too much, and we can even revert back to a single log_mel_spectrogram and split things up in the future if we needed to. @susnato - I'm happy to go with whatever you think is best here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like both of your design choices so either way works for me. Personally I would like to keep it as it is and change it in the future(if needed).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, in that case if you don't have a preference, let's just do the log transform in the __call__ method: it's easier to add methods than take away in this library because of backwards compatibility considerations

tests/models/pop2piano/test_modeling_pop2piano.py Outdated Show resolved Hide resolved
tests/models/pop2piano/test_tokenization_pop2piano.py Outdated Show resolved Hide resolved
@susnato
Copy link
Contributor Author

susnato commented Aug 11, 2023

Hi @sanchit-gandhi I have pushed the changes.

@sanchit-gandhi
Copy link
Contributor

Hey @susnato! Thanks for pushing the latest round of changes - would you mind resolving any comment threads that you've addressed, so that the reviewer knows what's been implemented and what's still pending? Thanks!

@susnato
Copy link
Contributor Author

susnato commented Aug 15, 2023

Hi @sanchit-gandhi, done.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work!

Thanks for all the rounds of iterating and being so patient and collaborative whilst we found a design for this model which works in the library. Now the tokenizer design is settled, only thing we need to add are a few more tests to make sure it works as expected and doesn't break in the future. Once that's done I think we're good to merge!

@@ -0,0 +1,190 @@
# Copyright 2022 The HuggingFace Inc. team. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Copyright 2022 The HuggingFace Inc. team. All rights reserved.
# Copyright 2023 The HuggingFace Inc. team. All rights reserved.

processor_outputs = processor(audio=input_speech, sampling_rate=sampling_rate, return_tensors="np")

for key in feature_extractor_outputs.keys():
self.assertAlmostEqual(feature_extractor_outputs[key].sum(), processor_outputs[key].sum(), delta=1e-2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a big difference considering they should be doing exactly the same thing 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I have changed it to 1e-4 in the current version.


return mel_specs

def log_mel_spectrogram(self, sequence: np.ndarray):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, in that case if you don't have a preference, let's just do the log transform in the __call__ method: it's easier to add methods than take away in this library because of backwards compatibility considerations


# This is the test for a real music from K-Pop genre.
@slow
def test_real_music(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really more of an integration test for whole pipeline than the tokenizer itself: many things can change upstream which would affect the output. It's a great test though! Could we move it to test_modeling_pop2piano.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


@require_torch
@require_pretty_midi
class Pop2PianoTokenizerTest(unittest.TestCase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense that we can't use TokenizerTesterMixin here. We should add any equivalent tests from that mixin that would apply here though e.g. test_get_vocab, test_save_and_load_tokenizer, test_internal_consistency.

There should also be tests for this tokenizer's specific logic e.g. mapping to / from the token_type and values

Copy link
Contributor Author

@susnato susnato Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should also be tests for this tokenizer's specific logic e.g. mapping to / from the token_type and values

If I am not wrong you are talking about conversion between token_ids and tokens(notes here) right? We already have that in the tests - test_call and test_batch_decode.

In addition to that, I have added the following tests - test_get_vocab, test_save_and_load_tokenizer, test_pickle_tokenizer, test_padding_side_in_kwargs, test_truncation_side_in_kwargs, test_right_and_left_padding, test_right_and_left_truncation, test_padding_to_multiple_of and test_padding_with_attention_mask.

Btw test_internal_consistency is not valid for this tokenizer because the outputs of the Feature Extractor is used in the tokenizer's batch_decode method in order to influence the conversion of the token_ids to notes, whereas the __call__ method converts notes to token_ids without using/generating similar outputs like the feature extractor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not wrong you are talking about conversion between token_ids and tokens(notes here) right? We already have that in the tests - test_call and test_batch_decode.

Yes, that's what I'm talking about. AFAICT, these tests only test the type of the returned objects, not their values. It would be good to have at least one test that makes sure the correct token type is mapped to notes in the relative_tokens_ids_to_notes and notes_to_midi methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @amyeroberts, in the recent push, I have extended the test_call to test for accuracy of the outputs of __call__ method and created another test (test_batch_decode_outputs) to check for the accuracy of the outputs of batch_decode method.

Please let me know if any more tests are needed or not.

@susnato
Copy link
Contributor Author

susnato commented Aug 17, 2023

Hi @amyeroberts I have pushed the changes you asked and added more tokenizer tests.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work!

Because of the structure of the model - it was a lot more complex than the average model PR to integrate this into the library. Thanks for all your work iterating with us and adding this to the library 🤗❤️

)
)

def check_resize_embeddings_Pop2Piano_v1_1(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: pep8 - capitalization for function names

Suggested change
def check_resize_embeddings_Pop2Piano_v1_1(
def check_resize_embeddings_pop2piano_v1_1(

@susnato
Copy link
Contributor Author

susnato commented Aug 21, 2023

Hi @amyeroberts I have pushed the change.

@susnato
Copy link
Contributor Author

susnato commented Aug 21, 2023

The checks are green!

Because of the structure of the model - it was a lot more complex than the average model PR to integrate this into the library.

Yes it was relatively hard but I learned a lot about this amazing library from this integration!
Also I will try to maintain the library structure and docs in future PRs! :D

@amyeroberts amyeroberts merged commit 450a181 into huggingface:main Aug 21, 2023
23 checks passed
parambharat pushed a commit to parambharat/transformers that referenced this pull request Sep 26, 2023
* init commit

* config updated also some modeling

* Processor and Model config combined

* extraction pipeline(upto before spectogram & mel_conditioner) added but not properly tested

* model loading successful!

* feature extractor done!

* FE can now be called from HF

* postprocessing added in fe file

* same as prev commit

* Pop2PianoConfig doc done

* cfg docs slightly changed

* fe docs done

* batched

* batched working!

* temp

* v1

* checking

* trying to go with generate

* with generate and model tests passed

* before rebasing

* .

* tests done docs done remaining others & nits

* nits

* LogMelSpectogram shifted to FeatureExtractor

* is_tf rmeoved from pop2piano/init

* import solved

* tokenization tests added

* minor fixed regarding modeling_pop2piano

* tokenizer changed to only return midi_object and other changes

* Updated paper abstract(Camera-ready version) (#2)

* more comments and nits

* ruff changes

* code quality fix

* sg comments

* t5 change added and rebased

* comments except batching

* batching done

* comments

* small doc fix

* example removed from modeling

* ckpt

* forward it compatible with fe and generation done

* comments

* comments

* code-quality fix(maybe)

* ckpts changed

* doc file changed from mdx to md

* test fixes

* tokenizer test fix

* changes

* nits done main changes remaining

* code modified

* Pop2PianoProcessor added with tests

* other comments

* added Pop2PianoProcessor to dummy_objects

* added require_onnx to modeling file

* changes

* update .md file

* remove extra line in index.md

* back to the main index

* added pop2piano to index

* Added tokenizer.__call__ with valid args and batch_decode and aligned the processor part too

* changes

* added return types to 2 tokenizer methods

* the PR build test might work now

* added backends

* PR build fix

* vocab added

* comments

* refactored vocab into 1 file

* added conversion script

* comments

* essentia version changed in .md

* comments

* more tokenizer tests added

* minor fix

* tests extended for outputs acc check

* small fix

---------

Co-authored-by: Jongho Choi <sweetcocoa@snu.ac.kr>
blbadger pushed a commit to blbadger/transformers that referenced this pull request Nov 8, 2023
* init commit

* config updated also some modeling

* Processor and Model config combined

* extraction pipeline(upto before spectogram & mel_conditioner) added but not properly tested

* model loading successful!

* feature extractor done!

* FE can now be called from HF

* postprocessing added in fe file

* same as prev commit

* Pop2PianoConfig doc done

* cfg docs slightly changed

* fe docs done

* batched

* batched working!

* temp

* v1

* checking

* trying to go with generate

* with generate and model tests passed

* before rebasing

* .

* tests done docs done remaining others & nits

* nits

* LogMelSpectogram shifted to FeatureExtractor

* is_tf rmeoved from pop2piano/init

* import solved

* tokenization tests added

* minor fixed regarding modeling_pop2piano

* tokenizer changed to only return midi_object and other changes

* Updated paper abstract(Camera-ready version) (#2)

* more comments and nits

* ruff changes

* code quality fix

* sg comments

* t5 change added and rebased

* comments except batching

* batching done

* comments

* small doc fix

* example removed from modeling

* ckpt

* forward it compatible with fe and generation done

* comments

* comments

* code-quality fix(maybe)

* ckpts changed

* doc file changed from mdx to md

* test fixes

* tokenizer test fix

* changes

* nits done main changes remaining

* code modified

* Pop2PianoProcessor added with tests

* other comments

* added Pop2PianoProcessor to dummy_objects

* added require_onnx to modeling file

* changes

* update .md file

* remove extra line in index.md

* back to the main index

* added pop2piano to index

* Added tokenizer.__call__ with valid args and batch_decode and aligned the processor part too

* changes

* added return types to 2 tokenizer methods

* the PR build test might work now

* added backends

* PR build fix

* vocab added

* comments

* refactored vocab into 1 file

* added conversion script

* comments

* essentia version changed in .md

* comments

* more tokenizer tests added

* minor fix

* tests extended for outputs acc check

* small fix

---------

Co-authored-by: Jongho Choi <sweetcocoa@snu.ac.kr>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add pop2piano
8 participants