In [39]:
#default_exp lstm


# Few-Shot LSTM Fine-Tuning

> Training an LSTM from scratch at test time for a single prediction

It should be possible to fine-tune at "runtime" with small number of examples from the training set.

Since the total training time must be under a minute, the model *cannot* be unreasonably large like BERT or GPT. Considering this, we'll train much simpler and smaller models that are known to have better convergence properties. This is important because we don't what the "training" data is going to be ahead of time, since this is user-supplied. Also, the model needs to be able to generalize from a small number of examples, for which large transformers may not be the best option.

`fastai` is well-suited for this rapid training where convergence across many samples with minimal configuration is more important than stictly obtaining the highest possible accuracy on the runtime training set.

In [40]:
#export
from ought.starter import *
import fastai
from fastai.text.all import *

`fastai`is easy to work with when you adhere to their `DataLoaders` format. So first, convert the JSON data into a pandas `DataFrame`

> Note: `fastai` handles importing common libraries like `pandas` and `matplotlib` under the usual namespaces, which is why you won't see those here.

In [2]:
path = Path('data/')

train = load_jsonl('data/train.jsonl')
valid = load_jsonl('data/dev.jsonl')

train_df = pd.DataFrame(train)
valid_df = pd.DataFrame(valid)

train_df['is_valid'] = False
valid_df['is_valid'] = True

df = train_df.append(valid_df, ignore_index=True)
df.head()

Unnamed: 0,label,text,meta,is_valid
0,False,thermodynamic analysis of quantum error correcting engines. quantum error correcting codes can be cast in a way which is strikingly similar to a quantum heat engine undergoing an otto cycle. in this paper we strengthen this connection further by carrying out a complete assessment of the thermodynamic properties of strokes operator based error correcting codes. this includes an expression for the entropy production in the cycle which as we show contains clear contributions stemming from the different sources of irreversibility. to illustrate our results we study a classical qubit error corr...,"{'id': '1911.06354', 'year': 2019}",False
1,False,nlo qcd corrections to wzjj production at the lhc. we present a summary of the first calculation of nlo qcd corrections to wzjj production with leptonic decays at the lhc. our results show that the next to leading order corrections reduce significantly the scale uncertainties.,"{'id': '1310.4369', 'year': 2013}",False
2,False,asymptotics for lipschitz percolation above tilted planes. we consider lipschitz percolation in dimensions above planes tilted by an angle along one or several coordinate axes. in particular we are interested in the asymptotics of the critical probability as as well as our principal results show that the convergence of the critical probability to is polynomial as and in addition we identify the correct order of this polynomial convergence and in we also obtain the correct prefactor.,"{'id': '1504.05405', 'year': 2015}",False
3,False,the colored jones polynomials for bridge links. kuperberg introduced web spaces for some lie algebras which are generalizations of the kauffman bracket skein module on a disk with marked points. we derive some formulas for and clasped web spaces by graphical calculus using skein theory. these formulas are colored version of skein relations twist formulas and bubble skein expansion formulas. we calculate the and colored jones polynomials of bridge knots and links explicitly using twist formulas.,"{'id': '1609.07289', 'year': 2016}",False
4,False,population mixtures and searches of lensed and extended quasars across photometric surveys. wide field photometric surveys enable searches of rare yet interesting objects such as strongly lensed quasars or quasars with a bright host galaxy. past searches for lensed quasars based on their optical and near infrared properties have relied on photometric cuts and spectroscopic pre selection as in the sloan quasar lens search or neural networks applied to photometric samples. these methods rely on cuts in morphology and colours with the risk of losing many interesting objects due to scatter in ...,"{'id': '1612.03821', 'year': 2016}",False


Given the number of samples, it should be possible to fine tune both a langage model *and* a classifier in under a minute, but it's not clear if this is profitable. SO let's try both.

## Fine-Tuning Language Model

In [8]:
dls_lm = TextDataLoaders.from_df(df, path=path, text_col='text', label_col='label', valid_col='is_valid', is_lm=True)
dls_lm.show_batch(max_n=3)

  return array(a, dtype, copy=False, order=order)


Unnamed: 0,text,text_
0,xxbos on xxunk s problem on the classifications of convex lattice polytopes . in xxunk . xxunk studied the classification problem for convex lattice polygons of given area . since then this problem and its analogues have been studied by b ar any xxunk xxunk xxunk xxunk and others . upper bounds for the numbers of non equivalent xxunk convex lattice polytopes of given volume or cardinality have been achieved . in,on xxunk s problem on the classifications of convex lattice polytopes . in xxunk . xxunk studied the classification problem for convex lattice polygons of given area . since then this problem and its analogues have been studied by b ar any xxunk xxunk xxunk xxunk and others . upper bounds for the numbers of non equivalent xxunk convex lattice polytopes of given volume or cardinality have been achieved . in this
1,squeezing of atomic ensembles in free space we xxunk on unique features that arise in the nanofiber geometry including anisotropy of both the intensity and polarization of the guided modes . we use a first principles stochastic xxunk equation to model the squeezing as function of time in the presence of xxunk due to optical xxunk . we find a peak xxunk squeezing of ~ db is achievable with current technology for,of atomic ensembles in free space we xxunk on unique features that arise in the nanofiber geometry including anisotropy of both the intensity and polarization of the guided modes . we use a first principles stochastic xxunk equation to model the squeezing as function of time in the presence of xxunk due to optical xxunk . we find a peak xxunk squeezing of ~ db is achievable with current technology for ~
2,we also extend the factorized resummation of multipolar amplitudes to generic mass ratio non precessing spinning black holes . lastly in our study we employ new recently computed higher order post newtonian terms in several xxunk modes and compute explicit expressions for the half and one and half post newtonian contributions to the odd parity current and even parity odd xxunk respectively . those results can be used to build more accurate,also extend the factorized resummation of multipolar amplitudes to generic mass ratio non precessing spinning black holes . lastly in our study we employ new recently computed higher order post newtonian terms in several xxunk modes and compute explicit expressions for the half and one and half post newtonian contributions to the odd parity current and even parity odd xxunk respectively . those results can be used to build more accurate templates


In [9]:
learn = language_model_learner(dls_lm, AWD_LSTM, drop_mult=0.5, metrics=[accuracy, Perplexity()], path=path, wd=0.1).to_fp16()
learn.fine_tune(5)

epoch,train_loss,valid_loss,accuracy,perplexity,time
0,5.814971,5.548374,0.199942,256.819672,00:02


epoch,train_loss,valid_loss,accuracy,perplexity,time
0,5.507803,5.307096,0.203712,201.76355,00:02
1,5.354535,5.116339,0.212079,166.723907,00:02
2,5.215483,5.041849,0.218871,154.75592,00:02
3,5.104574,5.015539,0.221631,150.737381,00:02
4,5.026503,5.011507,0.221808,150.130814,00:02


In [10]:
learn.save_encoder('finetuned')

## Adding Classification Head to Language Model

In [6]:
dls = TextDataLoaders.from_df(df, path="data", text_col='text', label_col='label', valid_col='is_valid', seq_len=50)
dls.show_batch(max_n=3)

  return array(a, dtype, copy=False, order=order)


Unnamed: 0,text,category
0,xxbos evaluation of peak wall stress in an ascending thoracic aortic xxunk using fsi simulations effects of aortic stiffness and peripheral resistance . purpose . it has been reported xxunk that rupture or xxunk in thoracic aortic xxunk taa often occur due to xxunk which may be modelled with sudden increase of peripheral resistance inducing xxunk changes of blood volumes in the xxunk . there is clinical evidence that more compliant xxunk are less prone to rupture as they can xxunk such changes of volume . the aim of the current paper is to verify this paradigm by evaluating computationally the role played by the variation of peripheral resistance and the impact of aortic stiffness onto peak wall stress in ascending taa . methods . fluid structure interaction fsi analyses were performed using xxunk specific geometries and boundary conditions derived from 4d mri datasets acquired on a xxunk . blood,False
1,xxbos grain opacity and the bulk composition of extrasolar planets . ii . an analytical model for the grain opacity in protoplanetary atmospheres . context . we investigate the grain opacity k gr in the atmosphere of xxunk . this is important for the planetary mass radius relation since k gr affects the h he envelope mass of low mass planets and the critical core mass of giant planets . aims . the goal of this study is to derive an analytical model for k gr . methods . our model is based on the comparison of the timescales of xxunk processes like grain settling in the stokes and xxunk regime growth by brownian motion xxunk and differential settling grain evaporation and grain xxunk due to envelope contraction . with these timescales we derive the grain size abundance and opacity . results . we find that the main growth process,False
2,xxbos evaluating the applicability of the fokker planck equation in polymer translocation a brownian dynamics study . brownian dynamics xxunk simulations are used to study the translocation dynamics of a coarse grained polymer through a xxunk nanopore . we consider the case of short xxunk with a polymer length n in the range n= . the rate of translocation is controlled by a tunable friction coefficient gamma 0p for monomers inside the nanopore . in the case of xxunk translocation the mean translocation time scales with polymer length n as < tau > ~ n n p xxunk where n p is the average number of monomers in the nanopore . the exponent approaches the value alpha= when the pore friction is sufficiently high in xxunk with the prediction for the case of the quasi static regime where pore friction xxunk . in the case of xxunk translocation the polymer,False


In [7]:
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)

In [8]:
learn = learn.load_encoder('finetuned')

In [9]:
learn.fine_tune(5, 5e-2)

epoch,train_loss,valid_loss,accuracy,time
0,0.7089,0.505493,0.892,00:02


epoch,train_loss,valid_loss,accuracy,time
0,0.522991,0.342583,0.898,00:04
1,0.391128,0.430593,0.89,00:04
2,0.316259,0.382648,0.89,00:04
3,0.251506,0.385069,0.894,00:04
4,0.196436,0.336438,0.896,00:04


## Fine-Tuning Classifier from Scratch

Now, we'll train the classifier on it's own and see if the performance is significantly worse.

In [3]:
dls = TextDataLoaders.from_df(df, path="data", text_col='text', label_col='label', valid_col='is_valid', seq_len=50)
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn.fine_tune(5, 5e-2)

  return array(a, dtype, copy=False, order=order)


epoch,train_loss,valid_loss,accuracy,time
0,0.708454,0.406301,0.902,00:03


epoch,train_loss,valid_loss,accuracy,time
0,0.592001,0.362658,0.89,00:04
1,0.439429,0.365592,0.89,00:04
2,0.32314,0.576635,0.89,00:04
3,0.249769,0.265709,0.914,00:04
4,0.195983,0.269402,0.912,00:04


Surprisingly, fine-tuning the classifier on its own is better than fine-tuning the language model + classifier in this case. This is good news, since it means we can allocate more time to training the classifier.

Finally, as a sanity check, we can see some sample predictions.

In [7]:
learn.show_results()

Unnamed: 0,text,category,category_
0,xxbos the energetics of giant radio galaxy lobes from inverse compton scattering observations . giant radio galaxy grg lobes are excellent laboratories to study the evolution of the particle and b field energetics . however these results are based on assumptions of the shape and extension of the grg lobe electron spectrum . we re examine the energetics of grg lobes as derived by inverse compton scattering of cmb photons ics cmb by relativistic electrons in rg lobes to assess the physical conditions of rg lobes their energetics and their radiation regime . we consider the grg da recently observed by xxunk as a reference case and we also discuss other rg lobes observed with chandra and xxunk . we model the spectral energy distribution of the da xxunk lobe to get constraint on the shape and the extension of the electron spectrum in the lobe by using multi frequency,False,False
1,xxbos asymptotic normality and xxunk in estimation of large gaussian graphical models . the gaussian graphical model a popular paradigm for studying relationship among variables in a wide range of applications has attracted great attention in recent years . this paper considers a fundamental question when is it possible to estimate low dimensional parameters at parametric square root rate in a large gaussian graphical model a novel regression approach is proposed to obtain asymptotically efficient estimation of each entry of a precision matrix under a xxunk condition relative to the sample size . when the precision matrix is not sufficiently sparse or xxunk the sample size is not sufficiently large a lower bound is established to show that it is no longer possible to achieve the parametric rate in the estimation of each entry . this lower bound result which provides an answer to the xxunk sample size question is,True,False
2,xxbos a large scale structure traced by oii emitters hosting a distant cluster at xxunk we present a xxunk narrow band imaging survey of oii emitters in and around the xxunk xxunk cluster at z= with xxunk xxunk on xxunk telescope . oii emitters were identified on the basis of narrow band excesses and photometric redshifts . we discovered a huge xxunk structure with some xxunk traced by oii emitters and found that the xxunk xxunk cluster is embedded in an even larger super structure than the one reported previously . oii emitters were spectroscopically confirmed with the detection of h alpha and or o xxrep 3 i emission lines by xxunk observations . in the high density regions such as cluster core and xxunk star forming oii emitters show a high xxunk by a factor of more than compared to the field region . although the star formation activity,False,False
3,xxbos xxunk based low delay live streaming using throughput predictions . recently http based adaptive streaming has become the de xxunk standard for video streaming over the internet . it allows xxunk to dynamically adapt media characteristics to network conditions in order to ensure a high quality of experience that is minimize xxunk xxunk while maximizing video quality at a reasonable level of quality changes . in the case of live streaming this task becomes particularly challenging due to the latency constraints . the challenge further increases if a xxunk uses a wireless network where the throughput is subject to considerable fluctuations . consequently live xxunk often exhibit xxunk of up to seconds . in the present work we introduce an adaptation algorithm for http based live streaming called lolypop low latency prediction based adaptation that is designed to operate with a transport latency of few seconds . to reach,False,False
4,xxbos large eddy simulations of turbulent flow for grid to rod xxunk in nuclear reactors . the grid to rod xxunk gtrf problem in xxunk water reactors is a flow induced vibration problem that results in xxunk and failure of the fuel xxunk in nuclear xxunk . in order to understand the fluid dynamics of gtrf and to build an archival database of turbulence statistics for various configurations implicit large eddy simulations of time dependent single phase turbulent flow have been performed in xxunk and xxunk rod bundles with a single grid xxunk . to assess the computational mesh and resolution requirements a method for quantitative assessment of xxunk meshes with no slip walls is described . the calculations have been carried out using hydra th a thermal xxunk code developed at los xxunk for the xxunk for advanced simulation of light water reactors a united states xxunk of energy,False,False
5,xxbos observation of the extremely bright flare of the fsrq xxunk with h.e.s.s . ii . in june the flat spectrum radio quasar xxunk xxunk an extremely bright gamma ray flare with an increase of the flux above mev by a factor in less than day revealing an intrinsic variability timescale of minutes as detected by the fermi lat . we present results of target of opportunity observations with the h.e.s.s . experiment on this source over the nights around the peak of the outburst . the h.e.s.s . data were analysed with mono and stereo chains . thanks to the extreme brightness of the source at gev energies it was possible to obtain data from fermi lat strictly simultaneous to the h.e.s.s . observation . simultaneous and quasi simultaneous observations at optical and x ray energies were xxunk to reconstruct the multi wavelength spectrum xxunk to constrain theoretical models,False,False
6,xxbos search for an xxunk in sodium and calcium in the transmission spectrum of xxunk cancri e. xxunk the aim of this work is to search for an absorption signal from exospheric sodium xxunk and xxunk ionized calcium ca in the optical transmission spectrum of the hot xxunk super earth cancri e. although the current best fitting models to the planet mass and radius require a possible atmospheric component uncertainties in the radius exist making it possible that cancri e could be a hot xxunk planet without an atmosphere . high resolution r time series spectra of five transits of cancri e obtained with three different telescopes xxunk vlt harps eso xxunk m harps n xxunk were analysed . targeting the sodium d lines and the calcium h and k lines the potential planet exospheric signal was filtered out from the much stronger stellar and xxunk signals making use of,False,False
7,xxbos interpolating helicity spinors between the instant form and the light front form . we discuss the helicity spinors interpolating between the instant form dynamics ifd and the front form dynamics or the light front dynamics lfd and present the interpolating helicity amplitudes as well as their xxunk for the scattering of two fermions and the annihilation of fermion and anti fermion . we xxunk the interpolation between the two dynamics ifd and lfd by an interpolation angle and derive not only the generalized helicity spinors in the chiral representation that links naturally the two typical ifd vs. lfd helicity spinors but also the generalized xxunk transformation that relates these generalized helicity spinors to the usual dirac spinors . analyzing the directions of the particle momentum and spin with the variation of the interpolation angle we xxunk the whole xxunk of the generalized helicity xxunk between the usual xxunk xxunk,False,False
8,xxbos xxunk xxunk drifting objects using an iterative algorithm with a forward trajectory model . the task of determining the origin of a drifting object after it has been located is highly complex due to the uncertainties in drift properties and environmental forcing wind waves and surface currents . usually the origin is inferred by running a trajectory model stochastic or deterministic in reverse . however this approach has some xxunk drawbacks most notably the fact that many drifting objects go through nonlinear state changes xxunk e.g. xxunk oil or a xxunk xxunk . this makes it difficult to naively construct a reverse time trajectory model which xxunk predicts the xxunk possible time the object may have started drifting . we propose instead a different approach where the original forward trajectory model is xxunk xxunk while an iterative xxunk and selection process allows us to retain only those particles that,False,False


## Refactor into a Single Class

We can refactor all this and export it as a single class with two useful methods:

- An initializer that will retrain a new model for *every* new instance. This is intended, since we do not know the training set ahead of time. One potential improvement here would be to continuously train on every new `.jsonl` file that comes in and save the weights, but there is not enough data for that here. 
- A `predict` method that takes in a sentence and returns a prediction by querying the trained model.

In [36]:
#export
class LSTMClassifier:
    def __init__(self, json='data/train.jsonl', samples=5, metrics=[]):
        self.path = json
        self.df = pd.DataFrame(uniform_samples(json, samples))
        self.dls = TextDataLoaders.from_df(self.df, path=json, text_col='text', label_col='label', valid_col=None, seq_len=50)
        self.learn = text_classifier_learner(self.dls, AWD_LSTM, drop_mult=0.5, metrics=metrics)
        self.learn.fine_tune(5, 5e-2)
        
    def predict(self, prompt):
        pred = self.learn.predict(prompt)[0]
        return 'NOT AI' if (pred == 'False') else 'AI'

> Note: you might have to restart the notebook to clear GPU memory at this point

In [34]:
test = load_jsonl("data/test_no_labels.jsonl")
example = test[0]
prompt = example['text']
prompt

'out of plane effect on the superconductivity of sr2 xbaxcuo3+d with tc up to 98k. we comment on the paper published by w.b. gao q.q. liu l.x. yang y.yu f.y. li c.q. jin and s. uchida in phys. rev. b and give alternate explanations for the enhanced superconductivity. the enhanced onset tc of 98k observed upon substituting ba for sr is attributed to optimal oxygen ordering rather than to the increase in volume. comparison with la2cuo +x samples suggest that the effect of disorder is overestimated.'

In [37]:
%%time
clas = LSTMClassifier(metrics=[accuracy])
pred = clas.predict(prompt)

  return array(a, dtype, copy=False, order=order)


epoch,train_loss,valid_loss,accuracy,time
0,0.767748,0.685123,0.7,00:01


epoch,train_loss,valid_loss,accuracy,time
0,0.542596,1.003131,0.3,00:01
1,0.654398,0.495892,0.7,00:01
2,0.649306,0.447348,1.0,00:01
3,0.573689,0.476737,1.0,00:01
4,0.522111,0.466225,0.95,00:01


CPU times: user 10.5 s, sys: 7.69 s, total: 18.2 s
Wall time: 16.6 s


In [38]:
pred

'Not AI'