In [1]:
import torch
from transformers import LongformerConfig, LongformerModel, LongformerTokenizer

In [4]:
from torch.utils.data import TensorDataset, DataLoader

In [5]:
input_ids, attention_mask, labels = torch.rand(100, 50), torch.rand(100, 50), torch.rand(100,)

In [7]:
dataset = TensorDataset(input_ids, attention_mask, labels)

In [9]:
loader = DataLoader(dataset, batch_size=16, shuffle=True)

In [12]:
for batch in loader:
    print(len(batch))
    print([b.size() for b in batch])
    break

3
[torch.Size([16, 50]), torch.Size([16, 50]), torch.Size([16])]


In [2]:
config = LongformerConfig.from_pretrained("allenai/longformer-base-4096")
tokenizer = LongformerTokenizer.from_pretrained("allenai/longformer-base-4096")
model = LongformerModel.from_pretrained("allenai/longformer-base-4096")

In [21]:
document = """
Apple on October 13, 2020 unveiled the newest high-end flagship iPhones, the iPhone 12 Pro and iPhone 12 Pro Max, which are being sold alongside the more affordably priced iPhone 12 and iPhone 12 mini.
Apple this year heavily emphasized the "Pro" aspect of the new iPhone 12 Pro models, suggesting the more expensive iPhones "push the boundaries of innovation" for the people who "want the absolute most out of their iPhones."
Available in 6.1-inch and 6.7-inch size options, the two new Pro models feature full-screen Super Retina XDR displays that are edge-to-edge with the exception of the Face ID notch.
The 6.1-inch iPhone 12 Pro has a resolution of 2532 x 1170 with 460 pixels per inch, while the 6.7-inch iPhone 12 Pro Max has a resolution of 2778 x 1284 and a 458 ppi. The displays offer HDR support with 1200 nits peak brightness, Wide Color, Haptic Touch, and True Tone, all features that have been introduced over the years.
Protecting the displays is a new Ceramic Shield cover, which Apple says "goes beyond glass" and is tougher than any smartphone glass. It's infused with nano-ceramic crystals and offers 4x better drop performance.
The iPhone 12 Pro and Pro Max have an entirely new look that's similar to the design of the iPad Pro with flat edges instead of the rounded edges used in prior models. There's a precision-milled matte glass back surrounded by a stainless steel band.
Color options include graphite, silver, gold, and pacific blue, with the blue shade replacing the midnight green used last year. The iPhone 12 Pro and Pro Max offer IP68 water and dust resistance and can hold up to submersion in 6 meters of water for up to 30 minutes.
The new iPhone 12 models are the first to feature 5G connectivity for faster downloads and uploads, better quality video streaming, improved gaming, and higher-definition 1080p FaceTime calls. To preserve battery life when using 5G, a Smart Data Mode reverts to an LTE connection when 5G speeds aren't necessary.
5G coverage is available worldwide, but only iPhone 12 devices sold in the United States support mmWave 5G, which is the fastest 5G technology available. iPhone 12 models sold in other countries are limited to the slower but more widely available Sub-6GHz 5G connectivity. In the U.S., 5G speeds can be as high as 4Gbps, even in highly populated areas.
Gigabit LTE is supported when 5G isn't available, as is WiFi 6 and Bluetooth 5.0. Like the iPhone 11 models, the iPhone 12 Pro models include a U1 Ultra Wideband chip for spatial awareness and interactivity with other devices that include the U1 feature such as the HomePod mini.
There's a new A14 chip inside the iPhone 12 Pro models, and it is the first chip in the smartphone industry built on a 5-nanometer process for performance and efficiency improvements. Apple says the 6-core CPU and 4-core GPU in the A14 are 50 percent faster than the fastest competing smartphone chips. The A14 chip includes a 16-core Neural Engine that offers an 80 percent increase in performance for machine learning tasks.
For the most part, the iPhone 12 Pro and iPhone 12 Pro Max offer identical specs, but the exception is the camera, where there are some notable differences. The iPhone 12 Pro is equipped with a new seven-element Wide camera with an f/1.6 aperture, which brings 27 percent improved low-light performance for photos and videos.
There's also an Ultra Wide camera with a 120-degree field of view and a 52mm telephoto lens that offers 4x optical zoom.
The iPhone 12 Pro Max also has a seven-element Wide camera with an f/1.6 aperture, but it has a 47 percent larger sensor for an 87 percent improvement in low light. It has the same Ultra Wide camera and a 65mm telephoto lens that enables 5x optical zoom.
While the iPhone 12 Pro offers dual optical image stabilization, the iPhone 12 Pro Max supports sensor-shift optical image stabilization for the Wide lens that stabilizes the sensor instead of the lens itself. Sensor-shift stabilization has previously been limited to DSLRs, and it offers better than ever stabilization for photos and videos.
For both models, A14 chip powers a new image signal processor and computational photography capabilities that weren't possible before. Night mode is now available for the front-facing TrueDepth camera and the Ultra Wide camera, and Deep Fusion is available for all cameras. A new Smart HDR 3 feature brings more true-to-life images.
There's also support for ProRAW, a feature that combines Apple's image processing and computational photography with the versatility of the RAW file format, offering full control over color, detail, and dynamic range.
As for video, the iPhone 12 Pro models support 4K 60fps video and HDR video recording with Dolby Vision at up to 60 frames per second, and better video stabilization. A Night mode time-lapse video option allows for impressive night time videos when a tripod is used.
The iPhone 12 Pro and Pro Max are equipped with a LiDAR Scanner that measures light distance and pixel depth of a scene to map out the area around the iPhone. It allows for more realistic AR experiences and it improves autofocus by 6x in low-light scenes for improved accuracy. The LiDAR Scanner enables Night mode portraits.
"""
tokenized = tokenizer.tokenize(document)
print("Tokenized size", len(tokenized))
print("Num of characters", len(document))    
      

Tokenized size 1146
Num of characters 5231


In [24]:
sample = tokenizer.encode(document, return_tensors="pt")
sample.size()

torch.Size([1, 1148])

In [35]:
# Forward pass, check the return
attention_mask = torch.ones(sample.shape, dtype=torch.long)
global_attention_mask = torch.zeros(sample.shape, dtype=torch.long)
output = model(sample, attention_mask=attention_mask, global_attention_mask=global_attention_mask, return_dict=True)

In [36]:
output.keys()

odict_keys(['last_hidden_state', 'pooler_output'])

In [40]:
output.last_hidden_state.shape

torch.Size([1, 1148, 768])

In [46]:
query = "iphone 12 technical specs"
small_document = "Apple on October 13, 2020 unveiled the newest high-end flagship iPhones, the iPhone 12 Pro and iPhone 12 Pro Max, which are being sold alongside the more affordably priced iPhone 12 and iPhone 12 mini."

encoded = tokenizer.encode(query, small_document)
decoded = tokenizer.convert_ids_to_tokens(encoded)

In [19]:
!conda install -y -c plotly plotly=4.13.0

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /Users/sanchitnevgi/opt/miniconda3/envs/adhoc

  added / updated specs:
    - plotly=4.13.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2020.11.8          |   py37hecd8cb5_0         148 KB
    plotly-4.13.0              |             py_0         5.9 MB  plotly
    retrying-1.3.3             |           py37_2          16 KB
    ------------------------------------------------------------
                                           Total:         6.0 MB

The following NEW packages will be INSTALLED:

  plotly             plotly/noarch::plotly-4.13.0-py_0
  retrying           pkgs/main/osx-64::retrying-1.3.3-py37_2

The following packages will be SUPERSEDED by a higher-priority channel:

  certifi            conda-forge::certifi-2020.11.8-py37hf~ --

In [12]:
import numpy as np
import plotly.express as px

counts = np.array([851,  79,  28,  13,  12,   6,   4,   1,   3,   3])
counts = counts / 1000
bins = np.array([1.252710e+04, 2.504620e+04, 3.756530e+04, 5.008440e+04, 6.260350e+04, 7.512260e+04, 8.764170e+04, 1.001608e+05, 1.126799e+05, 1.251990e+05])

In [17]:
fig = px.bar(x=bins, y=counts, labels={'x':'document lengths (chars)', 'y':'Proportion of total'})
fig.show()

In [16]:
import torch
a = torch.rand(16, 3)
torch.argmax(a, dim=1).shape

torch.Size([16])