Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for nomic-ai/nomic-embed-text-v1.5 model #1874

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

bhavika
Copy link

@bhavika bhavika commented May 24, 2024

What does this PR do?

Adds support for nomic-embed-text-v1.5 which is a variation of BERT.

I've tested this PR using the following script:

import numpy as np
from transformers import AutoTokenizer, AutoModel
from pathlib import Path
from optimum.exporters import TasksManager
from optimum.exporters.onnx import export
import onnx
from optimum.exporters.onnx import validate_model_outputs
from optimum.onnxruntime import ORTModelForFeatureExtraction
import torch

if __name__ == "__main__":
    MODEL_NAME = "nomic-ai/nomic-embed-text-v1.5"
    tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
    inputs = tokenizer("My name is Philipp and I live in Germany.", return_tensors="pt")

    hf_model = AutoModel.from_pretrained(
        MODEL_NAME,
        trust_remote_code=True,
        safe_serialization=True,
        revision="91d2d6bfdddf0b0da840f901b533e99bae30d757",
    )

    onnx_path = Path("tests/nomic_bert.onnx")
    onnx_config_constructor = TasksManager.get_exporter_config_constructor(
        "onnx", hf_model, task="feature-extraction", library_name="transformers"
    )
    onnx_config = onnx_config_constructor(hf_model.config)
    print(onnx_config)
    onnx_inputs, onnx_outputs = export(hf_model, onnx_config, onnx_path, onnx_config.DEFAULT_ONNX_OPSET)

    print("After export")
    print(onnx_inputs)
    print(onnx_outputs)

    onnx_model = onnx.load(onnx_path)
    onnx.checker.check_model(onnx_model)

    validate_model_outputs(onnx_config, hf_model, onnx_path, onnx_outputs, onnx_config.ATOL_FOR_VALIDATION)

    model = ORTModelForFeatureExtraction.from_pretrained(
        MODEL_NAME,
        file_name="onnx/model.onnx",
        trust_remote_code=True,
        safe_serialization=True,
        revision="91d2d6bfdddf0b0da840f901b533e99bae30d757",
    )

    optimum_model_outputs = model(**inputs)
    print(optimum_model_outputs.last_hidden_state)

    hf_model_outputs = hf_model(**inputs)
    print(hf_model_outputs.last_hidden_state)

    print("Are inputs from both models close?", np.allclose(optimum_model_outputs.last_hidden_state.cpu().detach().numpy(), hf_model_outputs.last_hidden_state.cpu().detach().numpy(), rtol=1e-3, atol=1e-3))

which yields:

❯ python tests/nomic_bert.py
/Users/bhavika/src/github.com/huggingface/optimum/.venv/lib/python3.9/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
<All keys matched successfully>
<optimum.exporters.onnx.model_configs.NomicBertOnnxConfig object at 0x131bc5c10>
Using framework PyTorch: 2.3.0
Overriding 1 configuration item(s)
        - use_cache -> False
/Users/bhavika/.cache/huggingface/modules/transformers_modules/nomic-ai/nomic-bert-2048/7b260c5676ce4ba4a117f15bc24ac13ab4b81695/modeling_hf_nomic_bert.py:621: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seqlen > self._seq_len_cached:
/Users/bhavika/.cache/huggingface/modules/transformers_modules/nomic-ai/nomic-bert-2048/7b260c5676ce4ba4a117f15bc24ac13ab4b81695/modeling_hf_nomic_bert.py:573: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if (
/Users/bhavika/.cache/huggingface/modules/transformers_modules/nomic-ai/nomic-bert-2048/7b260c5676ce4ba4a117f15bc24ac13ab4b81695/modeling_hf_nomic_bert.py:507: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert ro_dim <= x.shape[-1]
After export
['input_ids', 'attention_mask', 'token_type_ids']
['last_hidden_state']
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

Validating ONNX model tests/nomic_bert.onnx...
        -[✓] ONNX model output names match reference model (last_hidden_state)
        - Validating ONNX Model output "last_hidden_state":
                -[✓] (2, 16, 768) matches (2, 16, 768)
                -[✓] all values close (atol: 0.0001)
The argument `trust_remote_code` is to be used along with export=True. It will be ignored.
The ONNX file onnx/model.onnx is not a regular name used in optimum.onnxruntime, the ORTModel might not behave as expected.
tensor([[[ 1.1261, -0.4374, -3.6442,  ..., -0.6255, -0.1797, -0.4537],
         [ 1.0806, -0.7503, -2.3863,  ..., -0.2730, -0.1225, -0.2811],
         [ 0.8267, -0.6160, -1.9121,  ..., -0.5175, -0.0665,  0.4226],
         ...,
         [ 1.5216, -0.2706, -3.2862,  ..., -0.3274, -0.7411,  0.0825],
         [ 1.2570, -0.2871, -4.0211,  ..., -0.1678, -0.7790,  0.0134],
         [ 1.2286, -0.2899, -4.0887,  ..., -0.1727, -0.7650, -0.0908]]])
tensor([[[ 1.1261, -0.4374, -3.6442,  ..., -0.6255, -0.1797, -0.4537],
         [ 1.0806, -0.7503, -2.3863,  ..., -0.2730, -0.1225, -0.2811],
         [ 0.8267, -0.6160, -1.9121,  ..., -0.5175, -0.0665,  0.4226],
         ...,
         [ 1.5216, -0.2706, -3.2862,  ..., -0.3274, -0.7411,  0.0825],
         [ 1.2570, -0.2871, -4.0211,  ..., -0.1678, -0.7790,  0.0134],
         [ 1.2286, -0.2899, -4.0887,  ..., -0.1727, -0.7650, -0.0908]]],
       grad_fn=<NativeLayerNormBackward0>)
Are inputs from both models close? True

CLI exporter

optimum-cli export onnx -m nomic-ai/nomic-embed-text-v1.5 nomic_onnx_optimum --trust-remote-code

gives me very different results, and these vary on every error so the diff is sometimes very large:

❯ optimum-cli export onnx -m nomic-ai/nomic-embed-text-v1.5 nomic_onnx_optimum --trust-remote-code
Framework not specified. Using pt to export the model.
/Users/bhavika/src/github.com/huggingface/optimum/.venv/lib/python3.9/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
<All keys matched successfully>
Automatic task detection to feature-extraction (possible synonyms are: default, image-feature-extraction, mask-generation, sentence-similarity).
Using the export variant default. Available variants are:
    - default: The default ONNX variant.

***** Exporting submodel 1/1: SentenceTransformer *****
Using framework PyTorch: 2.3.0
Overriding 1 configuration item(s)
        - use_cache -> False
/Users/bhavika/.cache/huggingface/modules/transformers_modules/nomic-ai/nomic-bert-2048/7b260c5676ce4ba4a117f15bc24ac13ab4b81695/modeling_hf_nomic_bert.py:621: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seqlen > self._seq_len_cached:
/Users/bhavika/.cache/huggingface/modules/transformers_modules/nomic-ai/nomic-bert-2048/7b260c5676ce4ba4a117f15bc24ac13ab4b81695/modeling_hf_nomic_bert.py:573: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if (
/Users/bhavika/.cache/huggingface/modules/transformers_modules/nomic-ai/nomic-bert-2048/7b260c5676ce4ba4a117f15bc24ac13ab4b81695/modeling_hf_nomic_bert.py:507: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert ro_dim <= x.shape[-1]
Post-processing the exported models...
Deduplicating shared (tied) weights...

Validating ONNX model nomic_onnx_optimum/model.onnx...
        -[✓] ONNX model output names match reference model (token_embeddings, sentence_embedding)
        - Validating ONNX Model output "token_embeddings":
                -[✓] (2, 16, 768) matches (2, 16, 768)
                -[x] values not close enough, max diff: 7.82012939453125e-05 (atol: 1e-05)
        - Validating ONNX Model output "sentence_embedding":
                -[✓] (2, 768) matches (2, 768)
                -[x] values not close enough, max diff: 4.673004150390625e-05 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- token_embeddings: max diff = 7.82012939453125e-05
- sentence_embedding: max diff = 4.673004150390625e-05.
 The exported model was saved at: nomic_onnx_optimum

@xenova any thoughts on what I should check here? I'll test for sequence lengths too but this difference has me worried.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@xenova
Copy link
Contributor

xenova commented May 24, 2024

Thanks for this! 🤗 Can you confirm that the exported model produces the same results as the python version for 2048 < context length <= 8192? That would be very helpful!

@bhavika
Copy link
Author

bhavika commented May 24, 2024 via email

@bhavika bhavika marked this pull request as draft May 24, 2024 18:05
@bhavika bhavika marked this pull request as ready for review May 29, 2024 05:48
@bhavika
Copy link
Author

bhavika commented May 29, 2024

Optimum environment information for debugging:

Screenshot 2024-05-29 at 2 44 00 AM

I was wondering if it has anything to do with CUDA being available, but I see the same thing happen on a g2-standard-4 as well.

- `optimum` version: 1.20.0.dev0
- `transformers` version: 4.40.2
- Platform: Linux-5.10.0-26-cloud-amd64-x86_64-with-glibc2.31
- Python version: 3.10.13
- Huggingface_hub version: 0.23.2
- PyTorch version (GPU?): 2.3.0+cu121 (cuda available: True)
- Tensorflow version (GPU?): not installed (cuda available: NA)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- token_embeddings: max diff = 3.6067795008420944e-05
- sentence_embedding: max diff = 2.1696090698242188e-05.
 The exported model was saved at: nomic_onnx_optimum
(optimum) (base) gcpuser@bt-f609-head-e6t0afg4-compute:~/optimum$ optimum-cli export onnx -m nomic-ai/nomic-embed-text-v1.5 nomic_onnx_optimum --trust-remote-code
Framework not specified. Using pt to export the model.
/home/gcpuser/optimum/.venv/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
<All keys matched successfully>
Automatic task detection to feature-extraction (possible synonyms are: default, image-feature-extraction, mask-generation, sentence-similarity).
Using the export variant default. Available variants are:
    - default: The default ONNX variant.

***** Exporting submodel 1/1: SentenceTransformer *****
Using framework PyTorch: 2.3.0+cu121
Overriding 1 configuration item(s)
        - use_cache -> False
/home/gcpuser/.cache/huggingface/modules/transformers_modules/nomic-ai/nomic-bert-2048/7b260c5676ce4ba4a117f15bc24ac13ab4b81695/modeling_hf_nomic_bert.py:621: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seqlen > self._seq_len_cached:
/home/gcpuser/.cache/huggingface/modules/transformers_modules/nomic-ai/nomic-bert-2048/7b260c5676ce4ba4a117f15bc24ac13ab4b81695/modeling_hf_nomic_bert.py:574: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  seqlen > self._seq_len_cached
/home/gcpuser/.cache/huggingface/modules/transformers_modules/nomic-ai/nomic-bert-2048/7b260c5676ce4ba4a117f15bc24ac13ab4b81695/modeling_hf_nomic_bert.py:507: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert ro_dim <= x.shape[-1]
Post-processing the exported models...
Deduplicating shared (tied) weights...

Validating ONNX model nomic_onnx_optimum/model.onnx...
        -[✓] ONNX model output names match reference model (token_embeddings, sentence_embedding)
        - Validating ONNX Model output "token_embeddings":
                -[✓] (2, 16, 768) matches (2, 16, 768)
                -[x] values not close enough, max diff: 7.593631744384766e-05 (atol: 1e-05)
        - Validating ONNX Model output "sentence_embedding":
                -[✓] (2, 768) matches (2, 768)
                -[✓] all values close (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- token_embeddings: max diff = 7.593631744384766e-05.
 The exported model was saved at: nomic_onnx_optimum

@bhavika
Copy link
Author

bhavika commented May 29, 2024

Testing with longer texts:

import numpy as np
from transformers import AutoTokenizer, AutoModel
from pathlib import Path
from optimum.exporters import TasksManager
from optimum.exporters.onnx import export
import onnx
from optimum.exporters.onnx import validate_model_outputs
from optimum.onnxruntime import ORTModelForFeatureExtraction
import torch

# text generated by GPT-4o
long_input = """
The journey of video games is a fascinating narrative that has seen tremendous evolution since its inception in the mid-20th century. From rudimentary beginnings to today's intricate, immersive experiences, the video game industry's evolution has been marked by technological advancements, cultural shifts, and expanding audiences. The history of video games can be traced back to the 1950s, with the development of early electronic games. One of the earliest examples of a video game is "Tennis for Two," created by physicist William Higinbotham in 1958. This rudimentary tennis simulation displayed on an oscilloscope marked the beginning of interactive electronic entertainment, showcasing the potential of electronic systems to engage users in playful activities.

In the 1960s, the development of more sophisticated games on mainframe computers began to take shape. MIT's "Spacewar!" (1962) is often credited as the first influential digital game. Created by Steve Russell, Martin Graetz, and Wayne Wiitanen, "Spacewar!" featured two spaceships engaged in a dogfight, complete with gravity wells and hyperspace jumps. Although it was not commercially released, it was distributed among computer research labs, influencing future game developers and laying the groundwork for what video games could become.

The 1970s saw a significant turning point with the creation of "Pong" by Atari in 1972. This simple yet addictive tennis game transformed the gaming landscape by introducing arcade machines to the public, establishing the foundation of the commercial video game industry. These early arcade games captivated audiences and laid essential groundwork for future developments. The late 1970s and 1980s are often referred to as the Golden Age of video games, marked by rapid advancements in computing power and display technology. Color graphics and improved sound capabilities became standard, significantly enhancing the user experience. During this era, iconic arcade games like "Space Invaders" (1978), "Pac-Man" (1980), and "Donkey Kong" (1981) became cultural phenomena. These titles were not only entertaining but also demonstrated the potential for video games to be cultural touchstones.

Simultaneously, the home console market began to develop. The release of the Atari 2600 in 1977 was a pivotal moment, allowing players to bring the arcade experience home. The Atari 2600's success spurred other companies to enter the market, leading to a proliferation of consoles and game titles. However, the industry experienced rapid growth but was not without challenges. The video game crash of 1983 was a significant downturn caused by market saturation, poor-quality games, and competition from home computers. Many companies went bankrupt, and the future of video games appeared uncertain.

The industry's revival came with the release of the Nintendo Entertainment System (NES) in 1985. Nintendo's strict quality control and the introduction of iconic franchises like "Super Mario Bros." and "The Legend of Zelda" restored consumer confidence. The NES became a global success and set new standards for home gaming. Nintendo's success encouraged competition, notably from Sega, which released the Sega Genesis in 1988. The Genesis touted superior graphics and processing power, pushing both companies to innovate and leading to the creation of memorable characters such as Sonic the Hedgehog.

The late 1980s also saw the emergence of handheld gaming with the release of the Nintendo Game Boy in 1989. The Game Boy's portability and a library of popular games, including "Tetris," made it an instant hit and established handheld gaming as a significant market segment. The early 1990s ushered in the era of 16-bit consoles, with the Super Nintendo Entertainment System (SNES) and the Sega Genesis leading the charge, offering enhanced graphics, more complex gameplay, and larger game libraries. Memorable titles such as "The Legend of Zelda: A Link to the Past" and "Sonic the Hedgehog 2" became classics of the era.

The mid-1990s brought CD-ROM technology, allowing for larger game files and more intricate multimedia experiences. The Sega CD and the Sony PlayStation were among the pioneers in utilizing this technology, paving the way for more expansive and cinematic games. One of the most significant technological leaps in the history of video games was the transition from 2D to 3D graphics in the mid-1990s. Games like "Super Mario 64" (1996) and "The Legend of Zelda: Ocarina of Time" (1998) revolutionized gameplay by introducing fully three-dimensional environments. These titles not only showcased the capabilities of the Nintendo 64 but also set new standards for game design.

Sony's entry into the gaming market with the PlayStation in 1994 marked a significant shift in the industry. The PlayStation appealed to a broader audience, emphasizing mature content and third-party developer support. Games like "Final Fantasy VII" and "Metal Gear Solid" pushed the boundaries of storytelling and graphical fidelity. Microsoft entered the gaming fray with the release of the Xbox in 2001, introducing online gaming with Xbox Live and setting the stage for the future of multiplayer gaming. Iconic franchises like "Halo" emerged, cementing the Xbox as a formidable competitor.

The 2000s saw the rapid expansion of online gaming, with PC titles like "World of Warcraft" (2004) and console games like "Halo 2" (2004) demonstrating the potential of online multiplayer experiences. Online gaming fostered social communities and competitive play, transforming how players interacted with each other. Massively Multiplayer Online Role-Playing Games (MMORPGs) became a dominant genre. "World of Warcraft" became a cultural phenomenon, attracting millions of subscribers and maintaining a dedicated player base for over a decade. The success of MMORPGs highlighted the potential for games to be ever-evolving, persistent worlds.

The mid-2000s saw the rise of casual gaming, driven by platforms like Facebook and mobile devices. Games like "FarmVille" and "Angry Birds" reached broader audiences, introducing gaming to people who had never considered themselves gamers. The accessibility and simplicity of these games contributed to their widespread appeal. The release of the Nintendo Wii in 2006 brought another wave of innovation. The Wii's motion controls and intuitive interface appealed to a wide demographic, including families and elderly players, showcasing that gaming could be accessible and physically engaging.

The 2010s introduced the next generation of consoles, including the PlayStation 4, Xbox One, and Nintendo Switch. These consoles boasted impressive hardware capabilities, enabling lifelike graphics, expansive open worlds, and seamless online experiences. Esports emerged as a major industry, with games like "League of Legends," "Dota 2," and "Overwatch" drawing massive audiences and offering substantial prize pools. Esports athletes gained recognition akin to traditional sports stars, further legitimizing gaming as a competitive pursuit.

Independent game developers gained prominence, thanks to digital distribution platforms like Steam and the accessibility of game development tools. Indie titles like "Minecraft," "Undertale," and "Celeste" found success by offering unique gameplay experiences and innovative narratives. Advancements in Virtual Reality (VR) and Augmented Reality (AR) technology opened new dimensions for gaming. Devices like the Oculus Rift, PlayStation VR, and the AR game "Pokémon GO" offered immersive and interactive experiences that pushed the boundaries of traditional gaming.

The rise of streaming platforms like Twitch and YouTube Gaming transformed how games were consumed. Gamers became influencers, showcasing their skills, entertaining viewers, and building dedicated communities. This shift highlighted the social and performative aspects of gaming. From humble beginnings to a global cultural phenomenon, the evolution of video games is a testament to technological innovation, creative storytelling, and societal impact. The industry has continually adapted and grown, embracing new technologies and reaching broader audiences.

As we look to the future, the potential for video games to further evolve and shape our world remains boundless. Whether through advancements in AI, increased realism, or new forms of interactivity, the journey of video games is a dynamic and ever-changing narrative that will continue to captivate and inspire generations to come. The cultural significance of video games cannot be overstated. What began as simple amusements has now become an integral part of modern entertainment culture. Today, gaming influences not only the entertainment industry but also education, art, and social interactions. Video games offer a medium for storytelling and artistic expression, where developers can create rich, interactive narratives and worlds that engage players on multiple levels.

Traditional gameplay mechanics have become platforms for complex storytelling. Titles like "The Last of Us" and "Red Dead Redemption" serve as exemplary cases where the narrative drive is as compelling as the gameplay. The characters are well-developed, the plots intricate, and the worlds immersive, offering experiences akin to interactive movies.

Furthermore, video games have also contributed to social change and awareness. Games like "Hellblade: Senua's Sacrifice" tackle mental health issues, offering players insight into the experiences of individuals living with conditions such as psychosis. This ability to simulate different perspectives and experiences allows players to develop empathy and understanding of complex issues.

The educational potential of video games is increasingly being recognized. Gamification of learning provides an engaging way to understand complex subjects. Games like "Minecraft: Education Edition" have been used in classrooms to teach subjects ranging from history to coding. Virtual reality (VR) simulations allow medical students to practice surgeries or pilots to simulate flying before ever stepping into a real cockpit.

Augmented Reality (AR) games such as "Pokémon GO" have further blurred the lines between virtual and real worlds. These games bring digital components into our physical world, encouraging outdoor activity and social interaction. Imagine future applications where AR can assist in navigation, provide real-time data overlays, and even transform educational field trips into interactive learning experiences.

Technological advancements continue to stretch the limits of what is possible in gaming. Artificial Intelligence (AI) is improving non-player characters (NPCs) to act more human, creating more challenging and realistic interactions. Procedural generation algorithms are constructing vast, explorable worlds, offering something new with every game session. Ray tracing technology is making lighting in games more realistic than ever, and haptic feedback is adding a new layer of sensory immersion.

Mobile gaming, too, has grown exponentially. The accessibility of smartphones has brought gaming to almost everyone. Mobile games now encompass a wide variety of genres and complexities, from simple time-killers to complex strategy games and even MMORPGs.

The monetization strategies within the gaming industry have also evolved, with free-to-play models becoming increasingly common. These games generate revenue through advertisements and microtransactions. While this model has been met with some criticism, it has also allowed for the creation of expansive, continually updated games that remain free to play.

Cloud gaming appears to be the next frontier. Services like Google Stadia, NVIDIA GeForce Now, and Microsoft’s Xbox Cloud Gaming are experimenting with creating platforms where games are streamed over the internet, eliminating the need for powerful local hardware. If successful, this could revolutionize access to high-quality gaming experiences, making them accessible to a broader audience.

As the gaming industry expands, it also faces significant challenges. Issues like crunch culture—where developers are required to work excessively long hours to meet deadlines—have come under increasing scrutiny. The industry is beginning to recognize the importance of sustainable work practices and the value of mental health, though more progress is needed.

Inclusivity is another critical issue. Historically, video games have been male-dominated both in terms of the workforce and the target audience. However, this is changing as the industry strives for greater representation and inclusivity both in the games being produced and within the workforce.

Finally, the role of regulation is becoming more critical as video games become ubiquitous. Issues such as data privacy, online harassment, and gaming addiction are subjects of increasing concern. Governments and industry bodies must work together to create frameworks that protect players and address these challenges without stifling innovation.

In conclusion, the evolution of video games is a remarkable story of technological innovation, creative storytelling, and cultural impact. From simple beginnings to today's complex, immersive experiences, video games have become an integral part of modern life. They entertain, educate, and even inspire social change. As technology continues to advance, the potential for video games to further shape our world is boundless. The journey of video games is dynamic and ever-changing, promising to captivate and inspire generations to come.

Whether through advancements in AI, increased realism, or new forms of interactivity, the industry is poised to continue its growth and influence. As video games evolve, their impact on culture, art, education, and society will undoubtedly deepen, solidifying their place as one of the most important forms of modern entertainment and expression. The future of video games is a bright and exciting frontier, and we can only look forward to the innovations and transformations that lie ahead.
"""


if __name__ == "__main__":
    MODEL_NAME = "nomic-ai/nomic-embed-text-v1.5"
    tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
    inputs = tokenizer("My name is Philipp and I live in Germany.", return_tensors="pt", padding=True, truncation=True)
    long_inputs = tokenizer(long_input, return_tensors="pt", padding=True, truncation=True)

    hf_model = AutoModel.from_pretrained(
        MODEL_NAME,
        trust_remote_code=True,
        safe_serialization=True,
        revision="91d2d6bfdddf0b0da840f901b533e99bae30d757",
    )

    onnx_path = Path("tests/nomic_bert.onnx")
    onnx_config_constructor = TasksManager.get_exporter_config_constructor(
        "onnx", hf_model, task="feature-extraction", library_name="transformers"
    )
    onnx_config = onnx_config_constructor(hf_model.config)
    print(onnx_config)
    onnx_inputs, onnx_outputs = export(hf_model, onnx_config, onnx_path, onnx_config.DEFAULT_ONNX_OPSET)

    print("After export")
    print(onnx_inputs)
    print(onnx_outputs)

    onnx_model = onnx.load(onnx_path)
    onnx.checker.check_model(onnx_model)

    validate_model_outputs(onnx_config, hf_model, onnx_path, onnx_outputs, onnx_config.ATOL_FOR_VALIDATION)

    model = ORTModelForFeatureExtraction.from_pretrained(
        MODEL_NAME,
        file_name="onnx/model.onnx",
        trust_remote_code=True,
        safe_serialization=True,
        revision="91d2d6bfdddf0b0da840f901b533e99bae30d757",
    )

    optimum_model_outputs = model(**inputs)
    print(optimum_model_outputs.last_hidden_state)

    hf_model_outputs = hf_model(**inputs)
    print(hf_model_outputs.last_hidden_state)

    print("Are outputs from both models close?", np.allclose(optimum_model_outputs.last_hidden_state.cpu().detach().numpy(), hf_model_outputs.last_hidden_state.cpu().detach().numpy(), rtol=1e-3, atol=1e-3))

    print("Testing for longer sequence length for Nomic-AI model")

    optimum_model_outputs = model(**long_inputs)
    print(optimum_model_outputs.last_hidden_state)

    hf_model_outputs = hf_model(**long_inputs)
    print(hf_model_outputs.last_hidden_state)

    print("Are outputs from both models close when using longer texts?", np.allclose(optimum_model_outputs.last_hidden_state.cpu().detach().numpy(), hf_model_outputs.last_hidden_state.cpu().detach().numpy(), rtol=1e-3, atol=1e-3))

Relevant output:

Testing for longer sequence length for Nomic-AI model
tensor([[[ 8.7254e-01,  2.1854e+00, -4.2498e+00,  ..., -2.7854e-01,
          -4.8159e-02,  2.2158e+00],
         [ 1.1689e+00,  2.6456e+00, -3.5509e+00,  ..., -1.7534e-01,
          -1.0793e-01,  1.8791e+00],
         [ 5.7188e-01,  2.2278e+00, -3.5168e+00,  ...,  2.5150e-03,
           1.7937e-01,  2.1163e+00],
         ...,
         [ 1.3092e+00,  2.5185e+00, -3.9133e+00,  ..., -4.1854e-01,
           3.9970e-01,  2.5439e-01],
         [ 1.1691e+00,  1.9978e+00, -3.3293e+00,  ...,  3.6745e-02,
          -3.5299e-01, -1.9861e-01],
         [ 7.1556e-01,  1.4866e+00, -4.1898e+00,  ..., -3.4444e-01,
          -2.6155e-01,  9.4057e-01]]])
tensor([[[ 8.7254e-01,  2.1854e+00, -4.2498e+00,  ..., -2.7855e-01,
          -4.8159e-02,  2.2158e+00],
         [ 1.1689e+00,  2.6456e+00, -3.5509e+00,  ..., -1.7534e-01,
          -1.0793e-01,  1.8791e+00],
         [ 5.7188e-01,  2.2278e+00, -3.5168e+00,  ...,  2.5153e-03,
           1.7937e-01,  2.1163e+00],
         ...,
         [ 1.3092e+00,  2.5185e+00, -3.9133e+00,  ..., -4.1854e-01,
           3.9970e-01,  2.5439e-01],
         [ 1.1691e+00,  1.9978e+00, -3.3293e+00,  ...,  3.6745e-02,
          -3.5299e-01, -1.9861e-01],
         [ 7.1556e-01,  1.4866e+00, -4.1898e+00,  ..., -3.4443e-01,
          -2.6154e-01,  9.4057e-01]]], grad_fn=<NativeLayerNormBackward0>)
Are outputs from both models close when using longer texts? True

@xenova
Copy link
Contributor

xenova commented May 30, 2024

Nice! As a last check, can you test with 8192 tokens?

@bhavika
Copy link
Author

bhavika commented May 30, 2024

Nice! As a last check, can you test with 8192 tokens?

Sure! @xenova here it is, redacting the very long text I threw in there:

Note that I am logging how many tokens we end up with in the output:

import numpy as np
from transformers import AutoTokenizer, AutoModel
from pathlib import Path
from optimum.exporters import TasksManager
from optimum.exporters.onnx import export
import onnx
from optimum.exporters.onnx import validate_model_outputs
from optimum.onnxruntime import ORTModelForFeatureExtraction
import torch

long_input = """<redacted> 
"""


if __name__ == "__main__":
    MODEL_NAME = "nomic-ai/nomic-embed-text-v1.5"
    tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
    long_inputs = tokenizer(long_input, return_tensors="pt", padding=True)
    num_tokens = long_inputs['input_ids'].shape

    print(f"Number of tokens in the input: {num_tokens}")

    hf_model = AutoModel.from_pretrained(
        MODEL_NAME,
        trust_remote_code=True,
        safe_serialization=True,
        revision="91d2d6bfdddf0b0da840f901b533e99bae30d757",
    )

    onnx_path = Path("tests/nomic_bert.onnx")
    onnx_config_constructor = TasksManager.get_exporter_config_constructor(
        "onnx", hf_model, task="feature-extraction", library_name="transformers"
    )
    onnx_config = onnx_config_constructor(hf_model.config)
    onnx_inputs, onnx_outputs = export(hf_model, onnx_config, onnx_path, onnx_config.DEFAULT_ONNX_OPSET)

    print("After export")
    print(onnx_inputs)
    print(onnx_outputs)

    onnx_model = onnx.load(onnx_path)
    onnx.checker.check_model(onnx_model)

    validate_model_outputs(onnx_config, hf_model, onnx_path, onnx_outputs, onnx_config.ATOL_FOR_VALIDATION)

    model = ORTModelForFeatureExtraction.from_pretrained(
        MODEL_NAME,
        file_name="onnx/model.onnx",
        trust_remote_code=True,
        safe_serialization=True,
        revision="91d2d6bfdddf0b0da840f901b533e99bae30d757",
    )

    print("Testing for longer sequence length for Nomic-AI model")
    optimum_model_outputs = model(**long_inputs)
    hf_model_outputs = hf_model(**long_inputs)
    print("Are outputs from both models close when using longer texts?", np.allclose(optimum_model_outputs.last_hidden_state.cpu().detach().numpy(), hf_model_outputs.last_hidden_state.cpu().detach().numpy(), rtol=1e-3, atol=1e-3))


Output:

❯ python tests/nomic_bert.py
/Users/bhavika/src/github.com/huggingface/optimum/.venv/lib/python3.9/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Token indices sequence length is longer than the specified maximum sequence length for this model (8195 > 512). Running this sequence through the model will result in indexing errors
**Number of tokens in the input: torch.Size([1, 8195])**
<All keys matched successfully>
<optimum.exporters.onnx.model_configs.NomicBertOnnxConfig object at 0x16fdbc730>
Using framework PyTorch: 2.3.0
Overriding 1 configuration item(s)
        - use_cache -> False
/Users/bhavika/.cache/huggingface/modules/transformers_modules/nomic-ai/nomic-bert-2048/7b260c5676ce4ba4a117f15bc24ac13ab4b81695/modeling_hf_nomic_bert.py:621: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seqlen > self._seq_len_cached:
/Users/bhavika/.cache/huggingface/modules/transformers_modules/nomic-ai/nomic-bert-2048/7b260c5676ce4ba4a117f15bc24ac13ab4b81695/modeling_hf_nomic_bert.py:573: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if (
/Users/bhavika/.cache/huggingface/modules/transformers_modules/nomic-ai/nomic-bert-2048/7b260c5676ce4ba4a117f15bc24ac13ab4b81695/modeling_hf_nomic_bert.py:507: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert ro_dim <= x.shape[-1]
After export
['input_ids', 'attention_mask', 'token_type_ids']
['last_hidden_state']
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

Validating ONNX model tests/nomic_bert.onnx...
        -[✓] ONNX model output names match reference model (last_hidden_state)
        - Validating ONNX Model output "last_hidden_state":
                -[✓] (2, 16, 768) matches (2, 16, 768)
                -[x] values not close enough, max diff: 0.00017261505126953125 (atol: 0.0001)
Traceback (most recent call last):
  File "/Users/bhavika/src/github.com/huggingface/optimum/tests/nomic_bert.py", line 146, in <module>
    validate_model_outputs(onnx_config, hf_model, onnx_path, onnx_outputs, onnx_config.ATOL_FOR_VALIDATION)
  File "/Users/bhavika/src/github.com/huggingface/optimum/optimum/exporters/onnx/convert.py", line 233, in validate_model_outputs
    raise error
optimum.exporters.error_utils.AtolError: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 0.0001:
- last_hidden_state: max diff = 0.00017261505126953125

The last check about the inputs being close together never finishes because this process seems to consume too much memory (I have 64 GB on an M3 macbook). 💀

@bhavika
Copy link
Author

bhavika commented Jun 6, 2024

Update

Tried the test script here with a GPU instance - https://gist.github.com/bhavika/8827463b68a327dfe334a2a7fcc723de

❯ python tests/test_nomicbert.py
/Users/gcpuser/src/github.com/huggingface/optimum/.venv/lib/python3.9/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Token indices sequence length is longer than the specified maximum sequence length for this model (8195 > 512). Running this sequence through the model will result in indexing errors
**Number of tokens in the input: torch.Size([1, 8195])**
<All keys matched successfully>
<optimum.exporters.onnx.model_configs.NomicBertOnnxConfig object at 0x16fdbc730>
Using framework PyTorch: 2.3.0
Overriding 1 configuration item(s)
        - use_cache -> False
/Users/gcpuser/.cache/huggingface/modules/transformers_modules/nomic-ai/nomic-bert-2048/7b260c5676ce4ba4a117f15bc24ac13ab4b81695/modeling_hf_nomic_bert.py:621: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seqlen > self._seq_len_cached:
/Users/gcpuser/.cache/huggingface/modules/transformers_modules/nomic-ai/nomic-bert-2048/7b260c5676ce4ba4a117f15bc24ac13ab4b81695/modeling_hf_nomic_bert.py:573: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if (
/Users/gcpuser/.cache/huggingface/modules/transformers_modules/nomic-ai/nomic-bert-2048/7b260c5676ce4ba4a117f15bc24ac13ab4b81695/modeling_hf_nomic_bert.py:507: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert ro_dim <= x.shape[-1]
After export
['input_ids', 'attention_mask', 'token_type_ids']
['last_hidden_state']
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

Validating ONNX model tests/nomic_bert.onnx...
        -[✓] ONNX model output names match reference model (last_hidden_state)
        - Validating ONNX Model output "last_hidden_state":
                -[✓] (2, 16, 768) matches (2, 16, 768)
                -[✓] all values close (atol: 0.0001) 
The argument `trust_remote_code` is to be used along with export=True. It will be ignored.
The ONNX file onnx/model.onnx is not a regular name used in optimum.onnxruntime, the ORTModel might not behave as expected.
Testing for longer sequence length for Nomic-AI model
Are outputs from both models close when using longer texts? True

@xenova could I get a review here?

@bhavika
Copy link
Author

bhavika commented Jun 20, 2024

@fxmarty @mht-sharma Hi! 👋🏽 just wondering if there's any interest in accepting this PR? Anything else I can do to land it now?

Copy link
Collaborator

@fxmarty fxmarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment about the test that needs to be moved, but otherwise it looks good to me! Thanks a lot!

Could you also add a test in tests/exporters/onnx/test_exporters_onnx_cli.py with this model? As it uses a custom modeling, you'll need to use trust_remote_code=True. Please decorate it with @slow as well (as there is no tiny model for nomic on the Hub).

tests/test_nomicbert.py Outdated Show resolved Hide resolved
tests/test_nomicbert.py Outdated Show resolved Hide resolved
@bhavika
Copy link
Author

bhavika commented Jun 27, 2024

@fxmarty thanks for the feedback! I can update the tests for sure. Any tips on how to run the test suite/checks for this PR?

@bhavika bhavika changed the title Add nomic-bert config Add support for nomic-ai/nomic-embed-text-v1.5 model Jun 27, 2024
@fxmarty
Copy link
Collaborator

fxmarty commented Jun 28, 2024

Thank you! You could for example run: RUN_SLOW=1 pytest tests/exporters/onnx -k "test_custom_model" -s -vvvvv. As the test is decorated with @slow, it is not run in the normal CI (as requires a large model to run).

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@bhavika
Copy link
Author

bhavika commented Jul 13, 2024

@fxmarty could you trigger/approve the workflow runs for this PR please?

@fxmarty
Copy link
Collaborator

fxmarty commented Jul 16, 2024

@bhavika To solve

FAILED exporters/onnx/test_onnx_export.py::OnnxExportTestCase::test_all_models_tested - AssertionError: Not testing all models. Missing models: {'nomic-bert'}

can you add a PYTORCH_REMOTE_CODE_MODELS here

and add it here
missing_models_set = (
TasksManager._SUPPORTED_CLI_MODEL_TYPE
- set(PYTORCH_EXPORT_MODELS_TINY.keys())
- set(PYTORCH_TIMM_MODEL.keys())
- set(PYTORCH_SENTENCE_TRANSFORMERS_MODEL.keys())
)
? And adapt test_custom_model accordingly to use PYTORCH_REMOTE_CODE_MODELS.

There is also

FAILED onnxruntime/test_modeling.py::ORTModelForFeatureExtractionIntegrationTest::test_pipeline_ort_model_8_nomic_bert - KeyError: 'nomic-bert'
FAILED onnxruntime/test_modeling.py::ORTModelForFeatureExtractionIntegrationTest::test_compare_to_transformers_8_nomic_bert - KeyError: 'nomic-bert'

not sure why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants