<a href="https://colab.research.google.com/github/sahug/ds-bert/blob/main/BERT%20NLP%20-%20Masked%20Language%20Modeling%20using%20BERT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**BERT NLP - Casual Language Modeling using BERT**

**Language modeling** predicts words in a sentence. There are two forms of language modeling.
- **Causal Language Modeling** predicts the next token in a sequence of tokens, and the model can only attend to tokens on the left. `distilgpt2`
- **Masked Language Modeling** predicts a masked token in a sequence, and the model can attend to tokens bidirectionally. `distilroberta-base`






In [1]:
%pip install -qq datasets

[K     |████████████████████████████████| 346 kB 5.0 MB/s 
[K     |████████████████████████████████| 1.1 MB 20.7 MB/s 
[K     |████████████████████████████████| 86 kB 3.0 MB/s 
[K     |████████████████████████████████| 86 kB 2.4 MB/s 
[K     |████████████████████████████████| 212 kB 48.7 MB/s 
[K     |████████████████████████████████| 140 kB 12.3 MB/s 
[K     |████████████████████████████████| 596 kB 39.1 MB/s 
[K     |████████████████████████████████| 127 kB 15.1 MB/s 
[K     |████████████████████████████████| 94 kB 856 kB/s 
[K     |████████████████████████████████| 144 kB 38.3 MB/s 
[K     |████████████████████████████████| 271 kB 48.9 MB/s 
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.[0m
[?25h

**Load Dataset**

In [2]:
from datasets import load_dataset
eli5 = load_dataset("eli5", split="train_asks[:5000]")

Downloading builder script:   0%|          | 0.00/5.63k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

Downloading and preparing dataset eli5/LFQA_reddit (download: 6.03 MiB, generated: 1.26 GiB, post-processed: Unknown size, total: 1.26 GiB) to /root/.cache/huggingface/datasets/eli5/LFQA_reddit/1.0.0/17574e5502a10f41bbd17beba83e22475b499fa62caa1384a3d093fc856fe6fa...


Downloading:   0%|          | 0.00/3.50k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/576M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/21.1M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/53.0M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/286M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/9.65M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/17.7M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/330M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/18.7M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/36.2M [00:00<?, ?B/s]

Dataset eli5 downloaded and prepared to /root/.cache/huggingface/datasets/eli5/LFQA_reddit/1.0.0/17574e5502a10f41bbd17beba83e22475b499fa62caa1384a3d093fc856fe6fa. Subsequent calls will reuse this data.


**Train and Test Split**

In [3]:
eli5 = eli5.train_test_split(test_size=0.2)

In [4]:
eli5

DatasetDict({
    train: Dataset({
        features: ['q_id', 'title', 'selftext', 'document', 'subreddit', 'answers', 'title_urls', 'selftext_urls', 'answers_urls'],
        num_rows: 4000
    })
    test: Dataset({
        features: ['q_id', 'title', 'selftext', 'document', 'subreddit', 'answers', 'title_urls', 'selftext_urls', 'answers_urls'],
        num_rows: 1000
    })
})

In [5]:
eli5["train"][0], eli5["test"][0] 

({'answers': {'a_id': ['c51x5y6', 'c51wtax'],
   'score': [2, 2],
   'text': ["It's a fair question: We can't derive them very well, actually. \n\nWith fluids, you traditionally model it at the macroscopic end, where it's assumed to be homogeneous if not more (say, incompressible and whatnot). You use parameters describing bulk properties such as density, dynamic and kinematic viscosity and so on. Physically/conceptually the models are quite simple, although the resulting equations are not. ([Navier-Stokes](_URL_3_), for instance)\n\nOn the other end, you have physical chemistry and the basic interactions of molecules. ([intermolecular forces](_URL_2_)) These do depend on the shapes of the molecules, but a lot more than just that. Generally, all molecules attract at far ranges and repel at short range (although the distances and the amount of attraction vary quite a bit). \n\nEven that is a bit simplified, since you're not taking into account chemistry/chemical reactions, which can be 

**Look at Dataset**

In [6]:
from datasets import ClassLabel, Sequence
import random
import pandas as pd
from IPython.display import display, HTML


def show_random_elements(dataset, num_examples=10):
    assert num_examples <= len(
        dataset
    ), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset) - 1)
        while pick in picks:
            pick = random.randint(0, len(dataset) - 1)
        picks.append(pick)

    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
        elif isinstance(typ, Sequence) and isinstance(typ.feature, ClassLabel):
            df[column] = df[column].transform(
                lambda x: [typ.feature.names[i] for i in x]
            )
    display(HTML(df.to_html()))

In [8]:
show_random_elements(eli5["train"])

Unnamed: 0,q_id,title,selftext,document,subreddit,answers,title_urls,selftext_urls,answers_urls
0,51mslv,What is sea's lather made of?,,,askscience,"{'a_id': ['d7gmcb2'], 'text': ['Foam (lather) is the result of the stabilization of the interface between a liquid and gas. In everyday life you would recognize soaps and other surfactants as very efficient stabilizers that prefer the interface over the bulk solution. However, careful observation will show differences in bubble buoyancy and stability between pure water and various solutions (alcohol, salt, sugar and artificial flavorings ect). Sea foam is likely just a result of the high ion concentration and other small molecules present in the water. In certain coastal areas it could also be helped along by industrial and municipal runoff. Any analysis of the foam show mostly water along with probably slightly higher concentrations of the dissolved solids than in the bulk ocean water. Source: I work in the paint industry, foam is a big deal.'], 'score': [3]}",{'url': []},{'url': []},{'url': []}
1,4e6o3v,What is transition energy?,So what is transition energy? I am specifically interested in transition metals.,,askscience,"{'a_id': ['d1xiflg', 'd1xugrm'], 'text': ['In chemistry, transition energy is the amount of energy it takes to cause an electron to transition from one energy level to another. Typically you'll only see this for hydrogen because it only has one electron, and anything beyond that gets complicated really quickly. The name ""transition metal"" actually has nothing to do with transition energy. A transition metal is just any element with a partially filled d-orbital.', 'In the gas phase there is no d to d transition energy. However in solid phase compound and compounds in solution will in general have a d to d transition energy. This splitting of the d orbitals is down to the ligand field ( different groups will be arranged around the metal in a specific way eg the H2O molecules in the Fe(H2O)6 2+ ion is arranged in an octahedra), the arrangement of groups around the metal can be classified to have a certain symmetry defined by symmetry elements and group theory. The dx2-dy2 and dz2 form a group called Eg dxy dyz dxz form a group called T2g due to the shape of the Eg orbitals and the interaction with the ligands they are of higher energy than the t2g orbitals, this means they have a transition energy between them, which means they can absorb photons of light to promote the electron from one level to another, this gives rise to properties like color and can also explain the magnetic properties of certain compounds.'], 'score': [4, 3]}",{'url': []},{'url': []},{'url': []}
2,1z3e5v,"When an embryo is developing, how do the first few cells ""decide"" which end will be the head and which end will be the feet?",,,askscience,"{'a_id': ['cfq7jfj', 'cfqiw6h'], 'text': ['After the blastula, the embryo undergoes gastrulation where the germ layers are formed. These 3 germ layers will become all the different body tissues, and I'd recommend reading up on [germ layers](_URL_0_). Cells that differentiate do so by signalling cues from other cells through the process of signal transduction. These cues are not fully understood, but there are chemicals involved that affect which genes are read and by how much. These chemicals are released in specific patterns that govern how and when our cells differentiate.', 'Some of it is controlled by things called homeobox genes. They're very highly conserved genes whose purpose is to lay out the body plan along an axis, meaning they determine where the different limbs and such should go. A lot of research on homeobox (or hox) genes has been done in fruit flies, Drosophila melongaster, you can get an idea of what happens where those genes altered form the photos below. There is a lot more to it than just that, but homeobox genes play a major role in establishing the body layout. [The left image is a normal fruit fly. The right is one that has had its homebox genes altered so in place of antenna, legs grow from the head. That's what those long appendages are.](_URL_1_)'], 'score': [6, 4]}",{'url': []},{'url': []},"{'url': ['http://en.wikipedia.org/wiki/Germ_layer', 'http://biology.kenyon.edu/courses/biol114/Chap13/antennapedia.gif']}"
3,xor9i,"Can I train myself to like food, music, etc. that I currently don't like?",,,askscience,"{'a_id': ['c5o8jwj'], 'text': ['To an extent. We know that exposure to certain unappealing stimuli can increase our tolerance for it in a number of ways (adaptation, mostly). It's the same principle behind being exposed to a noxious smell, over time you will rate it as not quite as adverse as you once did (forget the cite on this paper, there are a few). However in terms of liking food, music, etc, that process takes longer. But it will only work to a certain extent because a lot of what we like depends on biology and upbringing, and you won't always be able to trump that. For example, if you are brought up around country music and your parents like country music, you begin to associate country music with being happy because that's what you observe in your parents and that's what you were exposed to. That could give you a preference for country music (there's actually a psychology prof at my school trying to get their kid to like a certain type of music by using this sort of conditional learning).'], 'score': [12]}",{'url': []},{'url': []},{'url': []}
4,ij0f0,Anyone out there an expert in opiate pharmacology and physiology?,"I am attempting to research the process that causes sugar cravings from opioid medications. Specifically, Methadone induces significant cravings for sugar and as a result causes a large amount of weight gain. many people on methadone say that these sugar cravings are so intense as to be irresistable. This is true of methadone both used for addiction and for pain management, so it isn't an addiction based problem. \n\nBut, this effect isn't limited to methadone, it is observed in every opioid based medication. I myself am on fentanyl patches and have noticed a significant change in my food cravings, eating a lot more ice cream than I ever have in the past. \n\nSo what is the source of these cravings? I know methadone has been shown to change the levels of Glucocorticoids, specifically corticosterone in mice, but I don't know if that would lead to increased insulin production and sugar cravings through hypoglycemia. It could be another process like activation of the reward/motivation pathway in the ""limbic system"" of the Basal ganglia, or due to decreased release of cortisol or norepinephrine. I can't find a single study on this particular phenomenon, at least not directly addressing the question.\n\nSo does anyone know how the psychopharmacological impact of opioid drugs leads to cravings for sugar? And, a tangentially related question, what would be a way to reduce these sugar cravings and prevent weight gain and health problems? Is there any medication, vitamin or herbal supplement or mineral that would satisfy the craving for sugar but not increase blood glucose levels and weight gain? \n\nSo many thanks for any answers to this question.",,askscience,"{'a_id': ['c245m3g', 'c245e55', 'c2455qn'], 'text': ['I am. And I think I can answer your question, the other two guys have skimmed over one vital aspect of opioid pharmacology(not trying to be a jerk or anything, I'm guessing this came off as though I'm a dick), and there is a very good reason as to why methadone specifically seems to cause this more than other opiates. Some opiates can, and will bind to other kinds of opioid receptors other than just the mu receptor. The main reason you become hungry for foods that provide a reward(taste really sweet, stuff you enjoy anyway, just maybe not to the same degree) DOES have to do with the mu receptor in that you're getting more reward out of something already rewarding(being on an opiate, be it for pain or recreation you are going to feel rewarded using the drug) but the biggest reason has to do with kappa-type 3 opioid receptor(also known as the nociceptin receptor) which actually increases appetite. Methadone binds to MU, Delta, and Kappa opioid receptors as well as having, if very mild, some NDMA antagonistic properties. Also, some people on medications for chronic pain notice they gain weight more rapidly than before, and there is a reason for that too--they are chronic pain patients, they're using the drugs to treat pain, naturally they don't exercise that often or that well(which can be one of the root causes of their pain, specifically lower back pain). With regards to medications you could take to prevent this from happening, well only things that inhibit your appetite would really be effective e.g. ritalin. However, they aren't made for this indication and its really not a very good idea really, it would work, most likely that is, but I wouldn't really be willing to prescribe a stimulant this way(I'm not the kind of doctor who treats ADD/ADHD either). keep in mind this isn't medical advice.', 'I'm not an opiate sort of guy, but I'll see if if I can help. [This abstract](_URL_1_) describes the effects of naloxone on glucose and insulin levels in dogs. [This abstract](_URL_0_) on opiate receptor blockade might be useful too. If nothing else, both papers will have a very nice trail of references from the intro and discussion sections. I couldn't find free versions of either, though. So you either need to keep looking ([GoogleScholar](_URL_0_) is excellent for this), or consult a librarian. .  > Is there any medication, vitamin or herbal supplement or mineral that would satisfy the craving for sugar but not increase blood glucose levels and weight gain? Generally, no. To anyone reading: This is not medical advice, always consult with your doctor when altering your diet. One could try switching to one of the non-caloric sugar substitutes. If you're trying to lose weight, making a food diary will help. ...so, keep a log of everything you eat and drink for one week, then switch from e.g. regular to diet soda, and see if this helps. . P.S. on Brain_Doc82 saying ""Sugar also acts on those mu opiate receptors [...]"" Don't read this too literally. While quite plausible that sugar is causing an activity change in neurons laden with mu receptors, I know of no evidence of glucose directly binding to opiate receptors.', '> It could be another process like activation of the reward/motivation pathway in the ""limbic system"" of the Basal ganglia, or due to decreased release of cortisol or norepinephrine. I can't find a single study on this particular phenomenon, at least not directly addressing the question. This was my first thought, and there is a good bit of research on this, though not that directly answers your question. Sugar cravings have been shown to be associated with activation of mu opiate receptors to release dopamine, and opiate blockers have been shown to *reduce* sugar cravings. Sugar also acts on those mu opiate receptors, and when you're already stimulating them with the opiate, your body is likely just craving more, and knows that it can get the same ""high"" from adding refined sugar to the diet. In someone already using fentanyl patches (likely for pain?) my guess is this advice won't work, but I'd recommend exercise or other natural ways to stimulate the reward circuitry to reduce sugar cravings. There are morphine blockers, but that would defeat the purpose of the opiate in the first place.'], 'score': [7, 3, 2]}",{'url': []},{'url': []},"{'url': ['http://www.sciencedirect.com/science/article/pii/0024320582905392', 'http://ajpendo.physiology.org/content/250/3/E236.short']}"
5,8iava5,Why is the relativistic adiabatic index 4/3?,"I was told that in the relativistic limit the adiabatic index approaches 4/3 for a monoatomic gas instead of 5/3 in the non\-relativistic case. I was told this occurs due to a reduction in degree of freedom but this may be incomplete and does not quite explain the new expression since adiabatic index = \(n \+ 2\)/n where n is the # of degrees of freedom. Thus I am wondering both quantitatively and qualitatively, why does the adiabatic index decrease, and to 4/3 specifically, in the relativistic regime for a monoatomic gas?",,askscience,"{'a_id': ['dyrhx0g'], 'text': ['The formula for adiabatic index in terms of number of degrees of freedom relies on the assumption that the energy is quadratic in the variable associated to the degree of freedom. This works well for non-relativistic gasses because all the commonly encountered degrees of freedom are decently approximated by quadratic kinetic/potential energy terms. I believe you can derive the adiabatic index from equipartition theorem directly. For general power laws, the equipartition theorem predicts that the average amount of energy in a given degree of freedom is inversely proportional to the exponent in the energy term. [At high momentum the relativistic kinetic energy term is approximately linear in momentum, so you expect each kinetic degree of freedom to behave as if it were two ordinary quadratic degrees of freedom,](_URL_0_) hence a monatomic gas in the relativistic regime looks as if it has 6 quadratic degrees of freedom and you get an adiabatic index of 4/3. I'm not sure whether or not that counts as a qualitative explanation, but I'm also not sure what a qualitative explanation would be.'], 'score': [2]}",{'url': []},{'url': []},{'url': ['https://en.wikipedia.org/wiki/Equipartition_theorem#Extreme_relativistic_ideal_gases']}
6,orwu1,"AskScience AMA series: We are researchers in Quantum Computing and Quantum Information, here to answer your questions.","Hi everyone, we are BugeyeContinuum, mdreed, a_dog_named_bob, LuklearFusion, and qinfo, and we all work in Quantum Computing and/or Quantum Information. Please ask us anything!\n\nP.S.: Other QIP panelists are welcome to join in the fun, just post a short bio similar to the ones below, and I'll add it up here :).\n\nTo get things started, here's some more about each of us:\n\nBugeyeContinuum majored in physics as undergrad, did some work on quantum algorithms for a course, and tried to help a chemistry optics lab looking to diversify into quantum info set up an entanglement experiment. Applied to grad schools after, currently working on simulating spin chains, specifically looking at quenching/annealing and perhaps some adiabatic quantum computation. Also interested in quantum biology, doing some reading there and might look to work on that once present project is done.\n\nmdreed majored in physics as an undergrad, doing his senior thesis on magnetic heterostructures and giant magentoresistance (with applications to hard drive read-heads.) He went to grad school immediately after graduating, joining a quantum computing lab in the first semester and staying in it since. He is in his final year of graduate school, and expects to either get a job or postdoc in the field of quantum information.\n\nLuklearFusion did his undergrad in Mathematical Physics, with his senior research project on quantum chaos. He's currently 6 months away from a _URL_0_. in Physics, studying the theory behind devices built from superconducting qubits and hybrid systems. He is also fairly well versed in quantum foundations (interpretations of quantum mechanics) and plans on pursuing this in his PhD research. He is currently applying to grad schools for his PhD, if anyone is interested in that kind of thing. He is also not in a North American timezone, so don't get mad at him if he doesn't answer you right away.\n\nqinfo is a postdoc working in theoretical quantum information, specifically in quantum error correction, stabilizer states and some aspects of multi-party entanglement.",,askscience,"{'a_id': ['c3jkg0j', 'c3jk2qu', 'c3jm16e', 'c3jl350', 'c3jln21', 'c3jmy0r', 'c3jn01e', 'c3jn60f', 'c3jmp38', 'c3jmorm', 'c3jn4y2', 'c3jo42p', 'c3jqfav', 'c3jnkgd', 'c3jmxk4', 'c3jmnrb', 'c3jlkp6', 'c3jkygo', 'c3jl9q3', 'c3jkrt3', 'c3jmsva', 'c3jmp5l', 'c3jn1ig', 'c3jkuof', 'c3jmbyy'], 'text': ['Basic questions from an entirely uneducated person: 1. What materials to you see (or are available) to serve as quantum computing devices? 2. What are the benefits of QC over current computational technology? 3. What will serve as the equivalence of a bit and transistor in a quantum computer? Thanks guys!', 'What do you think of D-Wave's claim to have a working quantum computer? Which modality do you think will work first? Which do you think will work best in the long run? Do you think topological quantum computing will be viable? Will it hold advantages over other systems? A lot of people talk about the fast algorithm aspect of QC, but what about using it to simulate quantum systems. Any immediate cool applications from that? Do any of you care about, or deal with, quantum foundations and interpretations of quantum mechanics? Anything you'd like to say about that?', 'Can you explain how a quantum computer works to someone who has very little understanding of physics?', 'What fact could each of you tell us that would seem totally baffling and/or counterintuitive on the surface?', 'I am currently on track to graduate with a BS in physics and planning on getting an MS in EE how useful will that be if/when quantum computing becomes practical?', 'Sweet!! Im in quantum information too and wondering which one of our many competitors you are :P', 'Former QC researcher here (I worked in J. Mooij's group at TU-Delft), but I've been out of the field for several years. IIRC a minimum of 1000 qubits would be necessary for a QC to outperform the best classical computer. What are the current prospects for being able to build a 1000-qubit computer and still prevent environmental decoherence?', 'What are the most important implications of [QIP = PSPACE](_URL_0_), and how has it affected research into quantum computing, if it all?', 'How applicable would this technology be to something like protein folding?', 'I am in the final year of my PhD and am a physicist working in a chemistry department. Basically I've been studying [these types of systems](_URL_1_) (see theory section) and, as you can see, there basically the molecular analogue of systems such as electron spin/light polarisation etc. They even have a sense of on and off (slight subtlety, possibly BS). Since im a physicist, i can see some potential for these that chemist are not particularly interested in. My question to you is if we could get these to entangle, could they be a candidate for a qubit? There molecules to so it is entirely feasible they could talk to each other (because of intermolecular bonding, i realise this is quite vague). My understanding of fundamental QM is pretty good but and i have a decent understanding of entanglement from Susskind's advanced lectures, but actual quantum computing is much a more complex (as im sure your aware). Would be great to hear from some experts!', 'How would you program a quantum computer? Assuming we manage to create a working quantum computer, how would it be different from an ordinary computer in terms of componets like hard drive, RAM, and graphics card. Do they need to use qubits too in order to be compatible?', 'I am a postdoc working in theoretical quantum information, specifically in quantum error correction, stabilizer states and some aspects of multi-party entanglement. I would like to join in the fun!', 'Oh man, awesome. I have a couple questions! 1) Brian Greene's PBS program described the hypothetical functionality of quantum computers as similar to being able to explore all paths of a maze at once—since the only path a particle *could* take is the one leading out of the maze, when observed it will have a 100% probability of being in the correct place. Would you characterize that as an accurate analogy? I've got a pretty basic understanding of quantum mechanics and a slightly less basic understanding of the structure of normal CPUs, and I find the idea of an actual implementation of this really baffling. Would quantum computers have some sort of physical structure that corresponds to ""paths"" in a maze? Is there a simplified way to explain how a piece of hardware could be set up using superposition to do calculations? 2) I've heard quantum cryptography described as ""more physics than math""—i.e. properties of the particles used (possibly entanglement?) would create security in a pretty straightforward way instead of requiring the incorporation of things like number theory. Do you agree with that? Have people come up with hypothetical one-way functions that could be easily checked with a quantum computer but not solved by one, or does that become unnecessary? Sorry if I sound kind of dumb! I'm only in my second year of university and these threads are giving me a headache, despite how interesting I find them. Alas.', 'From my perspective, as someone who does work in number theory, it looks like there's been very little in the last few years of problems we care about being found to have fast quantum algorithms. The only obvious exceptions are Shor's algorithm and estimating Gauss sums. Is this an accurate assessment and if so, why is there so much apparent difficulty? There seems to be a heavy disconnect between the people on the algorithms side and the people like most of you guys who are doing practical implementation. Is this disconnect actually there?', 'Okay, I'm going to risk sounding horribly dense but, **what the hell *is* quantum computing?** I'm a CS/Networking student and one of the biggest things that I took on was building a 4-bit computer from base logic (That was big). But as I look deeper into the world of EE/CS, I'm more and more befuddled: how are we supposed to work out quantum-level stuff when we're mucking about with 40-atom-wide transistors and we're STILL having problems? also: Boxers or Briefs? I hear physicists rock the briefs.', 'What are the top research institutions working on quantum computing today? I am hoping to go to grad school for quantum computing, but I only have a short list of places so far that seem to be doing the top research in the area. Also, what sort of distribution of research being done now is computer science based versus physics based? It seems like most work being done today is on implementing quantum computers rather than working on quantum algorithms, would you agree with that?', 'To start my question, I will state the assumptions that have lead to this notion. Assumption: Quantum Computing is tough in reality because any small movements break the tolerances of accuracy required for the 'apparatus' to function nominally. With the recent developments in locking magnetic dipoles - as seen in the video here: [example ](_URL_2_) - is there already a drive to get quantum computing technology locked in these states to prevent the disruption in tolerances mentioned in the assumption?', 'My question isn't actually scientific. I'm sorry if that's frowned upon. I wanted to go into quantum computing, went to university for computer science, found it was basically a degree in programming and have reapplied for theoretical physics. Was this a good idea, and how suitable is quantum computing for someone who just wants to get a PhD 'for science', and then go to work in a financial institution?', 'BugeyeContinuum: In relation to quantum bio and stat mech, how much influence does QM impact the reactions involved. Is there situations in nature where reactions loose predictability in bulk, or since reactions happen with such magnitudes of molecules that they always appear deterministic? I guess I'm essentially asking is there any strange behavior that is for chemistry analogous such as Bose–Einstein condensates are to phase properties?', 'What are the major engineering hurdles you can forsee? How energy intensive does it appear QC modules will be? Will they require all of the cooling that current modules do? I work in heat pipe design, that's why I'm curious :)', 'I did my undergrad capstone paper on quantum computing in the spring of 2010 but have been completely out of the loop since then. Have there been any huge developments in the past 2 years that are worth mentioning?', 'How would quantum computing fit into the traditional models of computers (Von Neumann and Harvard)? If not, what would the model look like in reference to these models.', 'What does Quantum Computing mean for the future of the internet and the globalization of information sharing and communication?', 'What do you think of anyonic quantum computing like they're working on at Microsoft Station Q?', 'Can you summarize how quantum theories have changed since 2009 or so?'], 'score': [47, 36, 9, 6, 5, 5, 4, 4, 4, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]}",{'url': []},{'url': ['M.Sc']},"{'url': ['http://arxiv.org/abs/0907.4737', 'http://jcp.aip.org/resource/1/jcpsa6/v135/i22/p224306_s1', 'http://www.youtube.com/watch?v=VyOtIsnG71U']}"
7,553siw,Is it possible to pick 6 colors with the most amount of contrast between them?,"Not sure if this is in the right sub, or if it's the right flair. Shout-out to /r/cubers!\n\nI'm asking this because I solve Rubik's Cubes and the like as a hobby. A common thing to do is to swap out the stickers on your cubes to increase the contrast between them so that color recognition is easier. \n\nMy question is, is there a set of 6 colors that is optimally contrasted from one another?",,askscience,"{'a_id': ['d87hgma', 'd8844ko'], 'text': ['if you have normal color vision (i.e. you are not colorblind), then the most [unique hues are red, green, blue, and yellow] (_URL_0_) - these colors serve as something like anchor points for all the other colors you see (all those other colors seem somehow related to one of the unique hues). red and green are as different as two colors can be, except for yellow and blue, which are also as different from one another as colors can be. if you add to these four the poles of the brightness axis - i.e. white and black, which are obviously strongly contrasting - then you have the six most contrasting colors in human vision. the problem here is that ~5% of human males do not see red and green hues, and so for them there cannot be an equivalent set of contrasts (it would have to be some kind of combination of blue/yellow and black/white, e.g. black vs white, darkblue vs brightyellow, and brightblue vs darkyellow).', 'Look at [this picture](_URL_1_). It's the a-b plane in the color space Lab, which was designed to ""make sense"" to humans in terms of distance (so contrast as you put it, even though the term is a bit incorrect here). As you can see, picking 5 or 6 colors is a bit hard by hand and ppl are gonna argue, but picking 4 is easy...'], 'score': [6, 2]}",{'url': []},{'url': []},"{'url': ['http://131.111.190.111/jdmollon/papers/MollonJordan1997UniqueHues.pdf', 'https://www.nippondenshoku.co.jp/web/english/colorstory/images/07_what_is_ucs_2.jpg']}"
8,16udsp,"[Materials Science/Metallurgy] AskScience: I was told about an odd thing called ""Lead Disease"" today. I tried Googling it, but got no results. Repost as I received no visible responses yesterday.","I was doing some historical wargaming today with an old buddy who makes a lot of his own figurines. We were talking about the materials used and how he had managed to acquire the better part of a ton of scrap lead from various sources. He then told me a story about the person he learned the skill from.\n\nThis person had managed to acquire several hundred pounds of old lead sheeting from a demolished hospital. This was, as you've probably guessed, the lead sheeting used keep any radiation from escaping from the X-ray and nuclear medicine rooms. The lead seemed fine and was melted down and used to create a large number of figurines. A few months later the maker noticed that there was an odd bulge in the paint on one figure. Pressing into it he discovered, to his horror, that the base lead of the figurine had been partially converted into a greyish powdery substance.\n\nNot panicking, he decided that some foreign matter must have gotten into the pour and this was what he was seeing. But then over the days that followed, more and more of the figurines made from this lead began to show the same issue. Some weeks later figurines made from lead from different sources began to show the same effect. It wasn't until the maker threw out every figurine made from the hospital lead and every other figurine that showed traces of what he called ""lead disease"" that the problem stopped.\n\nMy questions:\n\n* Is this ""lead disease"" a real thing?\n* If so, what is happening to the lead? Some sort of crystallization maybe?\n* Would the effect be due in any way to the decades of radiation exposure the lead experienced?\n* If this isn't a real thing, what was the maker seeing?\n* And finally, in either case, what would cause the apparent spread of the 'disease'?\n\nThank you very much for your time and reading over this.\n\n\n**EDIT** I've been reading up on 'zinc pest' (thanks to PYREX_500ml) and I'm wondering if this particular hospital might have cheaped out on their lead shielding and used lead with a high degree of zinc in it. Zinc is often found with lead in nature and may have made the lead figurines susceptible to 'zinc pest'.",,askscience,"{'a_id': ['c7zgxs4'], 'text': ['This sounds a bit like zinc pest, where lead-zinc alloys slowly blister and fall apart. It's usually discussed about lead impurities in zinc but I've never seen any evidence that lead with sufficient zinc impurities wouldn't experience it too. There is also a more commonly known phenomenon called tin pest in which tin which has been cooled to sufficiently low temperatures changes crystal structure and begins breaking down, catalyzing other tin to do the same. I'm not aware of any studies on how prevalent tin pest is in alloys but it's possible the lead he got from the hospital was a lead-tin alloy with enough tin that it began decomposing. However, neither of these would explain why other, pure, lead from other sources would decompose unless he melted it all together or the tin or zinc left enough residue behind to affect other pieces of lead, which seems unlikely to me. Radiation exposure damaging the crystalline structure of materials is well-documented but no latent effects should have persisted after melting and recasting, so radiation is probably unrelated.'], 'score': [2]}",{'url': []},{'url': []},{'url': []}
9,a7k4zu,Is there a formula to find out at which distance (me-points) two points look like one?,"Hey, I have a question about the resolving power. At which distance (from the points) do two 5cm points, with 5cm in between them, look like one? Is there some kind of formula to find out from any distance (me-points), that I could use like a cross multiplication?\n\n & #x200B;\n\nIf there were two points on the moon, it would look like one seen from Earth. But at what dimensions/distance etc..?\n\n & #x200B;\n\nThanks",,askscience,"{'a_id': ['ec41kng'], 'text': ['So, in practice this depends on how good your eyesight is etc. But there is an absolute physical limit for how well you can ever resolve something with an aperture of some fixed size. This is the *~~diffusion~~ diffraction limit*, where the fundamental waviness of light means that you can't see an image below that resolution, regardless of how perfect the optics of your eye or telescope are. The formula comes out to: d/D = 1.22 λ/A where d is the minimum distance between the two points, D is the distance to the points, λ is the wavelength of light (lower wavelength = better resolution), and A is the diameter of the aperture you're using - basically, the size of your pupil if you're using your naked eye. Taking d=5 cm, as you say, A=5mm for the pupil size (somewhat arbitrary), and λ=300 nm for some arbitrary wavelength of visible light, we get D=700 m. So, well below the distance to the Moon. This is part of why we map the Moon with lunar satellites rather than trying to just build a really big telescope on Earth.'], 'score': [5]}",{'url': []},{'url': []},{'url': []}


**Extract Text**

**Flatten** the dataset for easy extraction. We will be able to extract the data like `answers.text` instead of `["answers"]["text"]`

In [9]:
eli5 = eli5.flatten()

In [10]:
eli5["train"]["answers.text"][0]

["It's a fair question: We can't derive them very well, actually. \n\nWith fluids, you traditionally model it at the macroscopic end, where it's assumed to be homogeneous if not more (say, incompressible and whatnot). You use parameters describing bulk properties such as density, dynamic and kinematic viscosity and so on. Physically/conceptually the models are quite simple, although the resulting equations are not. ([Navier-Stokes](_URL_3_), for instance)\n\nOn the other end, you have physical chemistry and the basic interactions of molecules. ([intermolecular forces](_URL_2_)) These do depend on the shapes of the molecules, but a lot more than just that. Generally, all molecules attract at far ranges and repel at short range (although the distances and the amount of attraction vary quite a bit). \n\nEven that is a bit simplified, since you're not taking into account chemistry/chemical reactions, which can be important even to a fluid that doesn't seem to be 'reacting'. For instance, w

**Preprocess**

In [11]:
%pip install -qq transformers

[K     |████████████████████████████████| 4.2 MB 5.2 MB/s 
[K     |████████████████████████████████| 6.6 MB 43.8 MB/s 
[?25h

In [12]:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("distilroberta-base")

if tokenizer.pad_token is None:
    tokenizer.add_special_tokens({'pad_token': '[PAD]'})

Downloading:   0%|          | 0.00/480 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [13]:
def preprocess_function(examples):
    return tokenizer([" ".join(x) for x in examples["answers.text"]], truncation=True)

In [14]:
tokenized_eli5 = eli5.map(preprocess_function, batched=True, num_proc=4, remove_columns=eli5["train"].column_names)

        

#1:   0%|          | 0/1 [00:00<?, ?ba/s]

#0:   0%|          | 0/1 [00:00<?, ?ba/s]

#3:   0%|          | 0/1 [00:00<?, ?ba/s]

#2:   0%|          | 0/1 [00:00<?, ?ba/s]

        

#1:   0%|          | 0/1 [00:00<?, ?ba/s]

#0:   0%|          | 0/1 [00:00<?, ?ba/s]

#3:   0%|          | 0/1 [00:00<?, ?ba/s]

#2:   0%|          | 0/1 [00:00<?, ?ba/s]

In [15]:
tokenized_eli5

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask'],
        num_rows: 4000
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask'],
        num_rows: 1000
    })
})

In [16]:
print(tokenized_eli5["train"]["input_ids"][0])

[0, 243, 18, 10, 2105, 864, 35, 166, 64, 75, 34882, 106, 182, 157, 6, 888, 4, 1437, 50118, 50118, 3908, 26572, 6, 47, 10341, 1421, 24, 23, 5, 12303, 3866, 18137, 253, 6, 147, 24, 18, 9159, 7, 28, 9486, 42019, 114, 45, 55, 36, 28357, 6, 45059, 5224, 4748, 8, 99, 3654, 322, 370, 304, 17294, 9072, 8533, 3611, 215, 25, 16522, 6, 6878, 8, 449, 10748, 5183, 17737, 16254, 1571, 8, 98, 15, 4, 27592, 3435, 73, 38498, 13851, 5, 3092, 32, 1341, 2007, 6, 1712, 5, 5203, 43123, 32, 45, 4, 47567, 487, 37041, 12, 5320, 6568, 47620, 1215, 42703, 1215, 246, 1215, 238, 13, 4327, 43, 50118, 50118, 4148, 5, 97, 253, 6, 47, 33, 2166, 11877, 8, 5, 3280, 11324, 9, 20237, 4, 47567, 8007, 119, 4104, 32188, 1572, 47620, 1215, 42703, 1215, 176, 1215, 35122, 1216, 109, 6723, 15, 5, 16499, 9, 5, 20237, 6, 53, 10, 319, 55, 87, 95, 14, 4, 25817, 6, 70, 20237, 5696, 23, 444, 16296, 8, 2851, 523, 23, 765, 1186, 36, 24648, 5, 21459, 8, 5, 1280, 9, 13003, 10104, 1341, 10, 828, 322, 1437, 50118, 50118, 8170, 14, 16, 10, 8

**Capture Truncated Text**
When we tokenize texts the tokenizer truncates some of the texts based on default size. So we need a second preprocessing function to capture text truncated from any lengthy examples to prevent loss of information. This preprocessing function should:

- Concatenate all the text.
- Split the concatenated text into smaller chunks defined by block_size.

In [17]:
BLOCK_SIZE = 128

def group_text(examples):
  concatenated_examples = {k: sum(examples[k], []) for k in examples.keys()}
  total_length = len(concatenated_examples[list(examples.keys())[0]])
  result = {
      k: [t[i: i+ BLOCK_SIZE] for i in range(0, total_length, BLOCK_SIZE)]
      for k, t in concatenated_examples.items()
      }
  result["labels"] = result["input_ids"].copy()
  return result

In [18]:
lm_dataset = tokenized_eli5.map(group_text, batched=True, num_proc=4)

        

#0:   0%|          | 0/1 [00:00<?, ?ba/s]

#1:   0%|          | 0/1 [00:00<?, ?ba/s]

#3:   0%|          | 0/1 [00:00<?, ?ba/s]

#2:   0%|          | 0/1 [00:00<?, ?ba/s]

        

#0:   0%|          | 0/1 [00:00<?, ?ba/s]

#2:   0%|          | 0/1 [00:00<?, ?ba/s]

#1:   0%|          | 0/1 [00:00<?, ?ba/s]

#3:   0%|          | 0/1 [00:00<?, ?ba/s]

In [19]:
lm_dataset

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 7420
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 1909
    })
})

In [20]:
print(lm_dataset["train"]["input_ids"][0])

[0, 243, 18, 10, 2105, 864, 35, 166, 64, 75, 34882, 106, 182, 157, 6, 888, 4, 1437, 50118, 50118, 3908, 26572, 6, 47, 10341, 1421, 24, 23, 5, 12303, 3866, 18137, 253, 6, 147, 24, 18, 9159, 7, 28, 9486, 42019, 114, 45, 55, 36, 28357, 6, 45059, 5224, 4748, 8, 99, 3654, 322, 370, 304, 17294, 9072, 8533, 3611, 215, 25, 16522, 6, 6878, 8, 449, 10748, 5183, 17737, 16254, 1571, 8, 98, 15, 4, 27592, 3435, 73, 38498, 13851, 5, 3092, 32, 1341, 2007, 6, 1712, 5, 5203, 43123, 32, 45, 4, 47567, 487, 37041, 12, 5320, 6568, 47620, 1215, 42703, 1215, 246, 1215, 238, 13, 4327, 43, 50118, 50118, 4148, 5, 97, 253, 6, 47, 33, 2166, 11877, 8, 5, 3280, 11324, 9, 20237]


In [21]:
print(lm_dataset["train"]["labels"][0])

[0, 243, 18, 10, 2105, 864, 35, 166, 64, 75, 34882, 106, 182, 157, 6, 888, 4, 1437, 50118, 50118, 3908, 26572, 6, 47, 10341, 1421, 24, 23, 5, 12303, 3866, 18137, 253, 6, 147, 24, 18, 9159, 7, 28, 9486, 42019, 114, 45, 55, 36, 28357, 6, 45059, 5224, 4748, 8, 99, 3654, 322, 370, 304, 17294, 9072, 8533, 3611, 215, 25, 16522, 6, 6878, 8, 449, 10748, 5183, 17737, 16254, 1571, 8, 98, 15, 4, 27592, 3435, 73, 38498, 13851, 5, 3092, 32, 1341, 2007, 6, 1712, 5, 5203, 43123, 32, 45, 4, 47567, 487, 37041, 12, 5320, 6568, 47620, 1215, 42703, 1215, 246, 1215, 238, 13, 4327, 43, 50118, 50118, 4148, 5, 97, 253, 6, 47, 33, 2166, 11877, 8, 5, 3280, 11324, 9, 20237]


For **Causal Language Modeling**, use `DataCollatorForLanguageModeling` to create a batch of examples. It will also dynamically pad your text to the length of the longest element in its batch, so they are a uniform length. While it is possible to pad your text in the tokenizer function by setting padding=True, dynamic padding is more efficient.

In [22]:
from transformers import DataCollatorForLanguageModeling
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False, return_tensors="tf")

**Train**

To **fine-tune** a model in TensorFlow, start by converting your datasets to the tf.data.Dataset format with to_tf_dataset. Specify inputs and labels in columns, whether to shuffle the dataset order, batch size, and the data collator:

In [23]:
tf_train_set = lm_dataset["train"].to_tf_dataset(
    columns=["attention_mask", "input_ids", "labels"],
    dummy_labels=True,
    shuffle=True,
    batch_size=16,
    collate_fn=data_collator,
)

tf_test_set = lm_dataset["test"].to_tf_dataset(
    columns=["attention_mask", "input_ids", "labels"],
    dummy_labels=True,
    shuffle=False,
    batch_size=16,
    collate_fn=data_collator,
)

**Optimizer**

In [24]:
from transformers import create_optimizer, AdamWeightDecay
optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)

**Model**

In [25]:
from transformers import TFAutoModelForCausalLM
model = TFAutoModelForCausalLM.from_pretrained("distilroberta-base")

Downloading:   0%|          | 0.00/465M [00:00<?, ?B/s]

If you want to use `TFRobertaLMHeadModel` as a standalone, add `is_decoder=True.`
All model checkpoint layers were used when initializing TFRobertaForCausalLM.

All the layers of TFRobertaForCausalLM were initialized from the model checkpoint at distilroberta-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForCausalLM for predictions without further training.


**Compile**

In [26]:
import tensorflow as tf
model.compile(optimizer=optimizer)

No loss specified in compile() - the model's internal loss computation will be used as the loss. Don't panic - this is a common way to train TensorFlow models in Transformers! To disable this behaviour, please pass a loss argument, or explicitly pass `loss=None` if you do not want your model to compute a loss.


**Fit**

In [None]:
model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3)

Epoch 1/3
  4/463 [..............................] - ETA: 2:55:16 - loss: 16.3984