Add Pop2Piano #21785

susnato · 2023-02-24T13:15:23Z

What does this PR do?

Adds Pop2Piano model to HuggingFace.

Fixes #20126

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker

susnato · 2023-03-13T15:47:29Z

Hi @ArthurZucker the implementation is almost ready(also tests) but I feel that the way I implemented this model is not a descent level, thats why I want you to take a look at the Pop2PianoModel structure.
Just to be clear with the feature extractor, the Pop2PianoFeatureExtractor takes raw_audio as input and generates variable length output(10, 50000, 15, 62200), even if I pad the raw_audio at start, it will still produce different results for different audio files, so I used lists to stack them and then wrapped them through BatchFeature.

Please don't mind about docs I will change them afterwards

EDIT : Please ignore this

sweetcocoa · 2023-03-19T13:18:14Z

(Here is the author of pop2piano)
Thank you for doing this PR. It seems that this was implemented by understanding the original code better than me! Please feel free to ask me if there is anything I can check or do.

susnato · 2023-03-19T15:02:58Z

@sweetcocoa Thanks for you comments, HF team has helped me a lot in this integration.

ArthurZucker · 2023-03-20T11:01:57Z

For solving the import issues, you have to create a require_xxx with the name of the package. Look for example at the require_accelerate in the testing_utils.py! 😉

susnato · 2023-03-20T12:24:29Z

Hi @ArthurZucker thanks for you comment!
But I have already created require_xxx in testing_utils.py regarding essentia and pretty_midi and also I have used them in transformers/src/transformers/models/pop2piano/__init__.py.

src/transformers/__init__.py

src/transformers/models/pop2piano/__init__.py

src/transformers/models/pop2piano/feature_extraction_pop2piano.py

HuggingFaceDocBuilderDev · 2023-03-23T06:21:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

susnato · 2023-03-23T12:37:13Z

Hi @ArthurZucker sorry for the delay but the tests are green now! please review it.

ArthurZucker

Nice work already! Left a few comments, mostly let's remove as many dependencies as we can + remove one-liners. Congrats on getting matching logits btw 🔥

docs/source/en/model_doc/pop2piano.mdx

README.md

src/transformers/models/pop2piano/modeling_pop2piano.py

tests/models/pop2piano/test_feature_extraction_pop2piano.py

tests/models/pop2piano/test_modeling_pop2piano.py

tests/models/pop2piano/test_tokenization_pop2piano.py

ArthurZucker

Nice work already! Left a few comments, mostly let's remove as many dependencies as we can + remove one-liners. Congrats on getting matching logits btw 🔥

susnato · 2023-04-10T11:56:19Z

Hi @ArthurZucker , sorry for the huge delay, I have made most of the changes that you asked. Also there are some changes that I didn't do these are below -

I managed to remove dependency on soundfile and torchaudio but not librosa, since raw_audio is used 2 times first time in extract_rhythm which takes audio with original sampling_rate and the second time in single_preprocess which first upscales/downscales raw_audio to sampling_rate of 22050 and then uses it. And preloading raw_audio with sampling_rate of 22050(not with native sampling_rate) was giving very bad results! I tried to use scipy.resample but since it uses fft it is relatively slow and less accurate.
As you suggested to pad the feature_extractor outputs with silence, I tried to do that but I found that different audio files with same length have input_features of different shapes! For example one was having shape of [7, 38, 512] and another one of [6, 42, 512], both were 10s audios. I could pad them and use them in a batch but then I need to keep track of their shapes which would introduce another variable, what do you suggest? Should I leave them or try to pad and keep track of shapes?

The licensing page of essentia says - "Essentia is available under an open license, Affero GPLv3, for non-commercial applications, thus it is possible to test the library before deciding to licence it under a comercial licence." I don't know much about licensing so I will leave it upto you to decide what to change in the headings.

Also please forgive me if I missed something, I will change them in the future commit.

ArthurZucker · 2023-04-11T17:48:07Z

Hey! Will try to have a look asap

harshsingh32

looks fine

ArthurZucker · 2023-04-24T14:26:35Z

Pinging @sanchit-gandhi for a review too!

ArthurZucker

Already very clean! Thanks for all you efforts 🔥 will let @sanchit-gandhi do a review too!

docs/source/en/model_doc/pop2piano.mdx

src/transformers/models/pop2piano/configuration_pop2piano.py

src/transformers/models/pop2piano/feature_extraction_pop2piano.py

src/transformers/models/pop2piano/tokenization_pop2piano.py

tests/models/pop2piano/test_modeling_pop2piano.py

src/transformers/models/pop2piano/modeling_pop2piano.py

susnato · 2023-08-03T15:36:10Z

Hi @ArthurZucker I have pushed the changes you requested except this one(I have asked for some help about that on the respective thread).

BTW the failed circle-ci test is unrelated to this PR.

ArthurZucker

Looks a lot better thanks!

src/transformers/models/pop2piano/tokenization_pop2piano.py

susnato · 2023-08-08T09:24:16Z

Hi @ArthurZucker I have added the convert_pop2piano_weights_to_hf.py file, added a single vocab and removed the complex logic of _convert_token_to_id and _convert_id_to_token. Also we don't need those extra properties anymore.

Please let me know if any more changes are needed to the tokenizer or not.

ArthurZucker

A lot better! Thanks for the simplification

src/transformers/models/pop2piano/tokenization_pop2piano.py

ArthurZucker · 2023-08-08T09:25:55Z

cc @sanchit-gandhi for a final review!

sanchit-gandhi

Thanks for the huge effort on iterating here @susnato. In particular, the tokenizer code is pretty tricky, where you have to manage different array types and logic for post-processing. But overall it's looking very clean, so nice job!

Overall, I think the high-level API looks very clean and in-keeping with the other transformers audio models. The complexity is buried mainly in the feature extractor and tokenizer, which IMO is fine given how difficult the pre- and post-processing is for this model

Just a few minor suggestions regarding the feature extractor - otherwise I very much agree with the recent round of changes suggested by @ArthurZucker for the tokenizer!

docs/source/en/model_doc/pop2piano.md

src/transformers/models/pop2piano/feature_extraction_pop2piano.py

sanchit-gandhi · 2023-08-11T14:06:53Z

src/transformers/models/pop2piano/feature_extraction_pop2piano.py

+
+        return mel_specs
+
+    def log_mel_spectrogram(self, sequence: np.ndarray):


Actually feel like this could have been a single function that does the mel spectrogram and takes the logarithm - I personally think thin wrappers around functions hurts readability of code, and is somewhat against the transformers design where we avoid writing multiple small functions in place of a single larger one. But happy to keep it split into two if that's what @amyeroberts prefers

I don't believe that's part of the design philosophy of transformers! It's true that in some case, particularly forward passes, we like the logic to be more verbose and contained within the method such that the model logic is clear. Generally though, functions should be small and only do one thing. As the comments highlights, readability and clarity are unfortunately subjective :)

The reason I asked was because in code we should write fn do_a and fn do_b and then compose with b(a(x)) instead of fn do_a_and_b. As @sanchit-gandhi mentions, log_mel_spectrogram is effectively a small wrapper, and if I were to implement, I probably would just have mel_spectrogram, and do np.log(x) in the __call__ method. Or have mel_spectrogram as a module level function that the log_mel_spectrogram method calls.

I don't think it matters too much, and we can even revert back to a single log_mel_spectrogram and split things up in the future if we needed to. @susnato - I'm happy to go with whatever you think is best here.

I like both of your design choices so either way works for me. Personally I would like to keep it as it is and change it in the future(if needed).

OK, in that case if you don't have a preference, let's just do the log transform in the __call__ method: it's easier to add methods than take away in this library because of backwards compatibility considerations

src/transformers/models/pop2piano/processing_pop2piano.py

src/transformers/models/pop2piano/tokenization_pop2piano.py

tests/models/pop2piano/test_modeling_pop2piano.py

tests/models/pop2piano/test_processor_pop2piano.py

tests/models/pop2piano/test_tokenization_pop2piano.py

susnato · 2023-08-11T16:29:13Z

Hi @sanchit-gandhi I have pushed the changes.

sanchit-gandhi · 2023-08-15T14:03:32Z

Hey @susnato! Thanks for pushing the latest round of changes - would you mind resolving any comment threads that you've addressed, so that the reviewer knows what's been implemented and what's still pending? Thanks!

susnato · 2023-08-15T14:34:26Z

Hi @sanchit-gandhi, done.

docs/source/en/model_doc/pop2piano.md

amyeroberts

Amazing work!

Thanks for all the rounds of iterating and being so patient and collaborative whilst we found a design for this model which works in the library. Now the tokenizer design is settled, only thing we need to add are a few more tests to make sure it works as expected and doesn't break in the future. Once that's done I think we're good to merge!

amyeroberts · 2023-08-16T12:04:41Z

src/transformers/models/pop2piano/convert_pop2piano_weights_to_hf.py

@@ -0,0 +1,190 @@
+# Copyright 2022 The HuggingFace Inc. team. All rights reserved.


Suggested change

# Copyright 2022 The HuggingFace Inc. team. All rights reserved.

# Copyright 2023 The HuggingFace Inc. team. All rights reserved.

amyeroberts · 2023-08-16T12:13:44Z

tests/models/pop2piano/test_processor_pop2piano.py

+        processor_outputs = processor(audio=input_speech, sampling_rate=sampling_rate, return_tensors="np")
+
+        for key in feature_extractor_outputs.keys():
+            self.assertAlmostEqual(feature_extractor_outputs[key].sum(), processor_outputs[key].sum(), delta=1e-2)


This is quite a big difference considering they should be doing exactly the same thing 🤔

sorry, I have changed it to 1e-4 in the current version.

amyeroberts · 2023-08-16T12:19:32Z

src/transformers/models/pop2piano/feature_extraction_pop2piano.py

+
+        return mel_specs
+
+    def log_mel_spectrogram(self, sequence: np.ndarray):


OK, in that case if you don't have a preference, let's just do the log transform in the __call__ method: it's easier to add methods than take away in this library because of backwards compatibility considerations

amyeroberts · 2023-08-16T12:37:02Z

tests/models/pop2piano/test_tokenization_pop2piano.py

+
+    # This is the test for a real music from K-Pop genre.
+    @slow
+    def test_real_music(self):


This is really more of an integration test for whole pipeline than the tokenizer itself: many things can change upstream which would affect the output. It's a great test though! Could we move it to test_modeling_pop2piano.py?

amyeroberts · 2023-08-16T12:42:20Z

tests/models/pop2piano/test_tokenization_pop2piano.py

+
+@require_torch
+@require_pretty_midi
+class Pop2PianoTokenizerTest(unittest.TestCase):


It makes sense that we can't use TokenizerTesterMixin here. We should add any equivalent tests from that mixin that would apply here though e.g. test_get_vocab, test_save_and_load_tokenizer, test_internal_consistency.

There should also be tests for this tokenizer's specific logic e.g. mapping to / from the token_type and values

There should also be tests for this tokenizer's specific logic e.g. mapping to / from the token_type and values

If I am not wrong you are talking about conversion between token_ids and tokens(notes here) right? We already have that in the tests - test_call and test_batch_decode.

In addition to that, I have added the following tests - test_get_vocab, test_save_and_load_tokenizer, test_pickle_tokenizer, test_padding_side_in_kwargs, test_truncation_side_in_kwargs, test_right_and_left_padding, test_right_and_left_truncation, test_padding_to_multiple_of and test_padding_with_attention_mask.

Btw test_internal_consistency is not valid for this tokenizer because the outputs of the Feature Extractor is used in the tokenizer's batch_decode method in order to influence the conversion of the token_ids to notes, whereas the __call__ method converts notes to token_ids without using/generating similar outputs like the feature extractor.

If I am not wrong you are talking about conversion between token_ids and tokens(notes here) right? We already have that in the tests - test_call and test_batch_decode.

Yes, that's what I'm talking about. AFAICT, these tests only test the type of the returned objects, not their values. It would be good to have at least one test that makes sure the correct token type is mapped to notes in the relative_tokens_ids_to_notes and notes_to_midi methods

Hi @amyeroberts, in the recent push, I have extended the test_call to test for accuracy of the outputs of __call__ method and created another test (test_batch_decode_outputs) to check for the accuracy of the outputs of batch_decode method.

Please let me know if any more tests are needed or not.

susnato · 2023-08-17T13:48:51Z

Hi @amyeroberts I have pushed the changes you asked and added more tokenizer tests.

amyeroberts

Amazing work!

Because of the structure of the model - it was a lot more complex than the average model PR to integrate this into the library. Thanks for all your work iterating with us and adding this to the library 🤗❤️

amyeroberts · 2023-08-21T14:47:49Z

tests/models/pop2piano/test_modeling_pop2piano.py

+                    )
+                )
+
+    def check_resize_embeddings_Pop2Piano_v1_1(


nit: pep8 - capitalization for function names

Suggested change

def check_resize_embeddings_Pop2Piano_v1_1(

def check_resize_embeddings_pop2piano_v1_1(

susnato · 2023-08-21T15:05:03Z

Hi @amyeroberts I have pushed the change.

susnato · 2023-08-21T15:27:39Z

The checks are green!

Because of the structure of the model - it was a lot more complex than the average model PR to integrate this into the library.

Yes it was relatively hard but I learned a lot about this amazing library from this integration!
Also I will try to maintain the library structure and docs in future PRs! :D

* init commit * config updated also some modeling * Processor and Model config combined * extraction pipeline(upto before spectogram & mel_conditioner) added but not properly tested * model loading successful! * feature extractor done! * FE can now be called from HF * postprocessing added in fe file * same as prev commit * Pop2PianoConfig doc done * cfg docs slightly changed * fe docs done * batched * batched working! * temp * v1 * checking * trying to go with generate * with generate and model tests passed * before rebasing * . * tests done docs done remaining others & nits * nits * LogMelSpectogram shifted to FeatureExtractor * is_tf rmeoved from pop2piano/init * import solved * tokenization tests added * minor fixed regarding modeling_pop2piano * tokenizer changed to only return midi_object and other changes * Updated paper abstract(Camera-ready version) (#2) * more comments and nits * ruff changes * code quality fix * sg comments * t5 change added and rebased * comments except batching * batching done * comments * small doc fix * example removed from modeling * ckpt * forward it compatible with fe and generation done * comments * comments * code-quality fix(maybe) * ckpts changed * doc file changed from mdx to md * test fixes * tokenizer test fix * changes * nits done main changes remaining * code modified * Pop2PianoProcessor added with tests * other comments * added Pop2PianoProcessor to dummy_objects * added require_onnx to modeling file * changes * update .md file * remove extra line in index.md * back to the main index * added pop2piano to index * Added tokenizer.__call__ with valid args and batch_decode and aligned the processor part too * changes * added return types to 2 tokenizer methods * the PR build test might work now * added backends * PR build fix * vocab added * comments * refactored vocab into 1 file * added conversion script * comments * essentia version changed in .md * comments * more tokenizer tests added * minor fix * tests extended for outputs acc check * small fix --------- Co-authored-by: Jongho Choi <sweetcocoa@snu.ac.kr>

susnato changed the title ~~Add Pop2Piano~~ [WIP] Add Pop2Piano Feb 24, 2023

susnato force-pushed the pop2piano branch 2 times, most recently from d6f6f7e to e2cbd03 Compare March 18, 2023 18:07

susnato force-pushed the pop2piano branch 2 times, most recently from 30b4d9d to 5c81213 Compare March 19, 2023 14:57

susnato changed the title ~~[WIP] Add Pop2Piano~~ Add Pop2Piano Mar 19, 2023

susnato marked this pull request as ready for review March 19, 2023 14:59

susnato force-pushed the pop2piano branch from 5c81213 to 61c2fbf Compare March 19, 2023 17:46

ArthurZucker reviewed Mar 20, 2023

View reviewed changes

susnato force-pushed the pop2piano branch from e5d2aec to 4d9fcc3 Compare March 23, 2023 06:02

susnato force-pushed the pop2piano branch 2 times, most recently from b9357cd to 5dd2d02 Compare March 23, 2023 11:13

susnato requested a review from ArthurZucker March 23, 2023 18:14

ArthurZucker reviewed Mar 28, 2023

View reviewed changes

susnato force-pushed the pop2piano branch from 5dd2d02 to 941255a Compare April 10, 2023 11:05

susnato requested a review from ArthurZucker April 11, 2023 14:28

harshsingh32 reviewed Apr 24, 2023

View reviewed changes

ArthurZucker reviewed Apr 26, 2023

View reviewed changes

susnato force-pushed the pop2piano branch from ec68fa6 to 6d28b21 Compare April 28, 2023 18:50

ArthurZucker reviewed Aug 4, 2023

View reviewed changes

src/transformers/models/pop2piano/tokenization_pop2piano.py Outdated Show resolved Hide resolved

src/transformers/models/pop2piano/tokenization_pop2piano.py Outdated Show resolved Hide resolved

src/transformers/models/pop2piano/tokenization_pop2piano.py Show resolved Hide resolved

susnato added 2 commits August 8, 2023 12:42

refactored vocab into 1 file

48098cf

added conversion script

1dba26a

susnato requested a review from ArthurZucker August 8, 2023 09:24

ArthurZucker reviewed Aug 8, 2023

View reviewed changes

src/transformers/models/pop2piano/tokenization_pop2piano.py Show resolved Hide resolved

sanchit-gandhi approved these changes Aug 11, 2023

View reviewed changes

comments

c13437f

sweetcocoa reviewed Aug 16, 2023

View reviewed changes

docs/source/en/model_doc/pop2piano.md Outdated Show resolved Hide resolved

essentia version changed in .md

f64d9bb

amyeroberts reviewed Aug 16, 2023

View reviewed changes

susnato added 3 commits August 16, 2023 21:54

comments

3d03761

more tokenizer tests added

f32e959

minor fix

24cb7b5

susnato requested a review from amyeroberts August 17, 2023 13:50

tests extended for outputs acc check

ca3bb6a

amyeroberts approved these changes Aug 21, 2023

View reviewed changes

small fix

d6aaeca

amyeroberts merged commit 450a181 into huggingface:main Aug 21, 2023
23 checks passed

watreyoung mentioned this pull request Oct 20, 2023

How do I obtain details from a Pull Request? PyGithub/PyGithub#2204

Open


		return mel_specs

		def log_mel_spectrogram(self, sequence: np.ndarray):

		@@ -0,0 +1,190 @@
		# Copyright 2022 The HuggingFace Inc. team. All rights reserved.

	# Copyright 2022 The HuggingFace Inc. team. All rights reserved.
	# Copyright 2023 The HuggingFace Inc. team. All rights reserved.

	def check_resize_embeddings_Pop2Piano_v1_1(
	def check_resize_embeddings_pop2piano_v1_1(

Add Pop2Piano #21785

Add Pop2Piano #21785

Conversation

susnato commented Feb 24, 2023

What does this PR do?

Before submitting

Who can review?

susnato commented Mar 13, 2023 • edited Loading

sweetcocoa commented Mar 19, 2023

susnato commented Mar 19, 2023

ArthurZucker commented Mar 20, 2023

susnato commented Mar 20, 2023

HuggingFaceDocBuilderDev commented Mar 23, 2023

susnato commented Mar 23, 2023 • edited Loading

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

susnato commented Apr 10, 2023 • edited Loading

ArthurZucker commented Apr 11, 2023

harshsingh32 left a comment

Choose a reason for hiding this comment

ArthurZucker commented Apr 24, 2023

ArthurZucker left a comment

Choose a reason for hiding this comment

susnato commented Aug 3, 2023

ArthurZucker left a comment • edited Loading

Choose a reason for hiding this comment

susnato commented Aug 8, 2023

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker commented Aug 8, 2023

sanchit-gandhi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

susnato commented Aug 11, 2023 • edited Loading

sanchit-gandhi commented Aug 15, 2023

susnato commented Aug 15, 2023

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

susnato Aug 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

susnato commented Aug 17, 2023 • edited Loading

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

susnato commented Aug 21, 2023

susnato commented Aug 21, 2023 • edited Loading

susnato commented Mar 13, 2023 •

edited

Loading

susnato commented Mar 23, 2023 •

edited

Loading

susnato commented Apr 10, 2023 •

edited

Loading

ArthurZucker left a comment •

edited

Loading

susnato commented Aug 11, 2023 •

edited

Loading

susnato Aug 17, 2023 •

edited

Loading

susnato commented Aug 17, 2023 •

edited

Loading

susnato commented Aug 21, 2023 •

edited

Loading