#Live Week 8 - Simple Information Retrieval and Context Size

Imagine your task is to analyze long documents in various ways. You may need to extract information, or you may need to get questions answered. Now you get access to a new model which seems like a good fit, and you want to get a sense  of whether it may be a candidate for your task.

One area you may want to look at is whether the model is able to retrieve information and provide correct answers to questions contained in the context, and how this changes as the context size grows. Does a lot of extra information make it harder to get the answer? Is the model good to the maximal advertised context size? (Ideally you want to use a suitable test set, but let us look at it manually with a simple example.)

We copied the Wikipedia article on the '[Big Bang](
  https://en.wikipedia.org/wiki/Big_Bang)' , removed some equations, and extracted two paragraphs that i) contain a specific piece of information (~"who came up with the name 'Big Bang'") and ii) a preferred model of that person. The article has about 11,500k tokens.
  
  We will use example contexts consisting of the article limited a fraction of lines ($keep\_fraction\_main\_text$), with the paragraphs containing the information that we will ask about inserted into the context  after fractions $rel\_insert\_pos\_info, rel\_insert\_pos\_rel$ of the remaining lines. The question and some wrappers are added to the context, and see whether the model has problems to retrieve the answers, either at certain context length or at certain positions.

  Of course this is a very pedestrian approach, but it should illustrate the idea.

  We will use the 4-bit quantized version of [Qwen2 1.5bn-Instruct ](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct), using the Apache 2.0 license. We can test the model here - for memory constraints - with a context length of about 15000 tokens using a T4 processor. **It would be interesting to go up to 32k tokens, for which we would need an A-100 processor.**
  
Note that:

 **i) this simple test is not an assessment of the model's quality. If we find limitations, this would rather point to starting points to look at the prompting and usage guides of the model**

 **ii) This is a small model, so you would not expect it to be perfect. This is intentional to illustrate the points and also dictated by the limited hardware. **

###1. Get the Model

In [1]:
%%capture
!pip install portalocker
!pip install accelerate
!pip install bitsandbytes
#!pip install -q -U git+https://github.com/huggingface/transformers.git

In [2]:
from transformers import BitsAndBytesConfig

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import shutil, os, subprocess

Get the model in a quantized format:

In [3]:
%%capture


nf4_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_quant_type="nf4",
   bnb_4bit_use_double_quant=True,
   bnb_4bit_compute_dtype=torch.bfloat16,
   bnb_4bit_use_double_quant_linear=True
)

qwen_2_1_5b_inst_4bit = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-1.5B-Instruct", quantization_config=nf4_config)
qwen_2_1_5b_tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-1.5B-Instruct")

###2. Set Up Test Data

We start by defining the question text, the paragraphs with the base information and the paragraph that relates to it, and some wrapper components.

We then copy and clean the Big Bang Wikipedia article.

In [4]:
question_text = """Based on the text above who coined the term 'Big Bang' (not the model itself)? And which other cosmological model did that person prefer? Give a short answer covering both questions"""

intro_text = """Here is a text:\n---\n"""

outro_text = """\n---\n"""

info_text = """English astronomer Fred Hoyle is credited with coining the term "Big Bang" during a talk for a March 1949 BBC Radio broadcast,[42] saying: "These theories were based on the hypothesis that all the matter in the universe was created in one big bang at a particular time in the remote past."[43][44] However, it did not catch on until the 1970s.[44]."""

relation_text = """
It is popularly reported that English astronomer Fred Hoyle, who favored an alternative "steady-state" cosmological model, intended this to be pejorative,[45][46][47] but Hoyle explicitly denied this and said it was just a striking image meant to highlight the difference between the two models.[48][49][51] Helge Kragh writes that the evidence for the claim that it was meant as a pejorative is "unconvincing", and mentions a number of indications that it was not a pejorative.[44]
"""


In [5]:
base_text = """
Big Bang

The Big Bang is a physical theory that describes how the universe expanded from an initial state of high density and temperature.[1] The Big Bang theory was inspired by the discovery of the expanding Universe by Edwin Hubble. It was first proposed in 1927 by Roman Catholic priest and physicist Georges Lemaître. Lemaître reasoned that if we go back in time, there must be fewer and fewer matter, until all the energy of the universe is packed in a unique quantum.[2] Various cosmological models of the Big Bang explain the evolution of the observable universe from the earliest known periods through its subsequent large-scale form.[3][4][5] These models offer a comprehensive explanation for a broad range of observed phenomena, including the abundance of light elements, the cosmic microwave background (CMB) radiation, and large-scale structure. The overall uniformity of the universe, known as the flatness problem, is explained through cosmic inflation: a sudden and very rapid expansion of space during the earliest moments. However, physics currently lacks a widely accepted theory of quantum gravity that can successfully model the earliest conditions of the Big Bang.

Crucially, these models are compatible with the Hubble–Lemaître law—the observation that the farther away a galaxy is, the faster it is moving away from Earth. Extrapolating this cosmic expansion backwards in time using the known laws of physics, the models describe an increasingly concentrated cosmos preceded by a singularity in which space and time lose meaning (typically named "the Big Bang singularity").[6] In 1964 the CMB was discovered, which convinced many cosmologists that the competing steady-state model of cosmic evolution was falsified,[7] since the Big Bang models predict a uniform background radiation caused by high temperatures and densities in the distant past. A wide range of empirical evidence strongly favors the Big Bang event, which is now essentially universally accepted.[8] Detailed measurements of the expansion rate of the universe place the Big Bang singularity at an estimated 13.787±0.020 billion years ago, which is considered the age of the universe.[9]

There remain aspects of the observed universe that are not yet adequately explained by the Big Bang models. After its initial expansion, the universe cooled sufficiently to allow the formation of subatomic particles, and later atoms. The unequal abundances of matter and antimatter that allowed this to occur is an unexplained effect known as baryon asymmetry. These primordial elements—mostly hydrogen, with some helium and lithium—later coalesced through gravity, forming early stars and galaxies. Astronomers observe the gravitational effects of an unknown dark matter surrounding galaxies. Most of the gravitational potential in the universe seems to be in this form, and the Big Bang models and various observations indicate that this excess gravitational potential is not created by baryonic matter, such as normal atoms. Measurements of the redshifts of supernovae indicate that the expansion of the universe is accelerating, an observation attributed to an unexplained phenomenon known as dark energy.[10]

Features of the models
The Big Bang models offer a comprehensive explanation for a broad range of observed phenomena, including the abundances of the light elements, the CMB, large-scale structure, and Hubble's law.[11] The models depend on two major assumptions: the universality of physical laws and the cosmological principle. The universality of physical laws is one of the underlying principles of the theory of relativity. The cosmological principle states that on large scales the universe is homogeneous and isotropic—appearing the same in all directions regardless of location.[12]

These ideas were initially taken as postulates, but later efforts were made to test each of them. For example, the first assumption has been tested by observations showing that the largest possible deviation of the fine-structure constant over much of the age of the universe is of order 10−5.[13] Also, general relativity has passed stringent tests on the scale of the Solar System and binary stars.[14][15][notes 1]

The large-scale universe appears isotropic as viewed from Earth. If it is indeed isotropic, the cosmological principle can be derived from the simpler Copernican principle, which states that there is no preferred (or special) observer or vantage point. To this end, the cosmological principle has been confirmed to a level of 10−5 via observations of the temperature of the CMB. At the scale of the CMB horizon, the universe has been measured to be homogeneous with an upper bound on the order of 10% inhomogeneity, as of 1995.[16]

Horizons
Main article: Cosmological horizon
An important feature of the Big Bang spacetime is the presence of particle horizons. Since the universe has a finite age, and light travels at a finite speed, there may be events in the past whose light has not yet had time to reach earth. This places a limit or a past horizon on the most distant objects that can be observed. Conversely, because space is expanding, and more distant objects are receding ever more quickly, light emitted by us today may never "catch up" to very distant objects. This defines a future horizon, which limits the events in the future that we will be able to influence. The presence of either type of horizon depends on the details of the FLRW model that describes our universe.[17]

Our understanding of the universe back to very early times suggests that there is a past horizon, though in practice our view is also limited by the opacity of the universe at early times. So our view cannot extend further backward in time, though the horizon recedes in space. If the expansion of the universe continues to accelerate, there is a future horizon as well.[17]

Thermalization
Some processes in the early universe occurred too slowly, compared to the expansion rate of the universe, to reach approximate thermodynamic equilibrium. Others were fast enough to reach thermalization. The parameter usually used to find out whether a process in the very early universe has reached thermal equilibrium is the ratio between the rate of the process (usually rate of collisions between particles) and the Hubble parameter. The larger the ratio, the more time particles had to thermalize before they were too far away from each other.[18]

Timeline
Main article: Chronology of the universe
External Timeline	A graphical timeline is available at
Graphical timeline of the Big Bang
According to the Big Bang models, the universe at the beginning was very hot and very compact, and since then it has been expanding and cooling.

Singularity
See also: Gravitational singularity, Initial singularity, and Planck units § Cosmology
Extrapolation of the expansion of the universe backwards in time using general relativity yields an infinite density and temperature at a finite time in the past.[19] This irregular behavior, known as the gravitational singularity, indicates that general relativity is not an adequate description of the laws of physics in this regime. Models based on general relativity alone cannot fully extrapolate toward the singularity.[6] In some proposals, such as the emergent Universe models, the singularity is replaced by another cosmological epoch. A different approach identifies the initial singularity as a singularity predicted by some models of the Big Bang theory to have existed before the Big Bang.[20][clarification needed]

This primordial singularity is itself sometimes called "the Big Bang",[21] but the term can also refer to a more generic early hot, dense phase[22][notes 2] of the universe. In either case, "the Big Bang" as an event is also colloquially referred to as the "birth" of our universe since it represents the point in history where the universe can be verified to have entered into a regime where the laws of physics as we understand them (specifically general relativity and the Standard Model of particle physics) work. Based on measurements of the expansion using Type Ia supernovae and measurements of temperature fluctuations in the cosmic microwave background, the time that has passed since that event—known as the "age of the universe"—is 13.8 billion years.[23]

Despite being extremely dense at this time—far denser than is usually required to form a black hole—the universe did not re-collapse into a singularity. Commonly used calculations and limits for explaining gravitational collapse are usually based upon objects of relatively constant size, such as stars, and do not apply to rapidly expanding space such as the Big Bang. Since the early universe did not immediately collapse into a multitude of black holes, matter at that time must have been very evenly distributed with a negligible density gradient.[24]

Inflation and baryogenesis
Main articles: Inflation (cosmology) and Baryogenesis
The earliest phases of the Big Bang are subject to much speculation, since astronomical data about them are not available. In the most common models the universe was filled homogeneously and isotropically with a very high energy density and huge temperatures and pressures, and was very rapidly expanding and cooling. The period up to 10−43 seconds into the expansion, the Planck epoch, was a phase in which the four fundamental forces—the electromagnetic force, the strong nuclear force, the weak nuclear force, and the gravitational force, were unified as one.[25] In this stage, the characteristic scale length of the universe was the Planck length, 1.6×10−35 m, and consequently had a temperature of approximately 1032 degrees Celsius. Even the very concept of a particle breaks down in these conditions. A proper understanding of this period awaits the development of a theory of quantum gravity.[26][27] The Planck epoch was succeeded by the grand unification epoch beginning at 10−43 seconds, where gravitation separated from the other forces as the universe's temperature fell.[25]

At approximately 10−37 seconds into the expansion, a phase transition caused a cosmic inflation, during which the universe grew exponentially, unconstrained by the light speed invariance, and temperatures dropped by a factor of 100,000. This concept is motivated by the flatness problem, where the density of matter and energy is very close to the critical density needed to produce a flat universe. That is, the shape of the universe has no overall geometric curvature due to gravitational influence. Microscopic quantum fluctuations that occurred because of Heisenberg's uncertainty principle were "frozen in" by inflation, becoming amplified into the seeds that would later form the large-scale structure of the universe.[28] At a time around 10−36 seconds, the electroweak epoch begins when the strong nuclear force separates from the other forces, with only the electromagnetic force and weak nuclear force remaining unified.[29]

Inflation stopped locally at around 10−33 to 10−32 seconds, with the observable universe's volume having increased by a factor of at least 1078. Reheating occurred until the universe obtained the temperatures required for the production of a quark–gluon plasma as well as all other elementary particles.[30][31] Temperatures were so high that the random motions of particles were at relativistic speeds, and particle–antiparticle pairs of all kinds were being continuously created and destroyed in collisions.[1] At some point, an unknown reaction called baryogenesis violated the conservation of baryon number, leading to a very small excess of quarks and leptons over antiquarks and antileptons—of the order of one part in 30 million. This resulted in the predominance of matter over antimatter in the present universe.[32]

Cooling
Main articles: Big Bang nucleosynthesis and Cosmic microwave background
A map of the universe, with specks and strands of light of different colors.
Panoramic view of the entire near-infrared sky reveals the distribution of galaxies beyond the Milky Way. Galaxies are color-coded by redshift.
The universe continued to decrease in density and fall in temperature, hence the typical energy of each particle was decreasing. Symmetry-breaking phase transitions put the fundamental forces of physics and the parameters of elementary particles into their present form, with the electromagnetic force and weak nuclear force separating at about 10−12 seconds.[29][33]

After about 10−11 seconds, the picture becomes less speculative, since particle energies drop to values that can be attained in particle accelerators. At about 10−6 seconds, quarks and gluons combined to form baryons such as protons and neutrons. The small excess of quarks over antiquarks led to a small excess of baryons over antibaryons. The temperature was no longer high enough to create either new proton–antiproton or neutron–antineutron pairs. A mass annihilation immediately followed, leaving just one in 108 of the original matter particles and none of their antiparticles.[34] A similar process happened at about 1 second for electrons and positrons. After these annihilations, the remaining protons, neutrons and electrons were no longer moving relativistically and the energy density of the universe was dominated by photons (with a minor contribution from neutrinos).

A few minutes into the expansion, when the temperature was about a billion kelvin and the density of matter in the universe was comparable to the current density of Earth's atmosphere, neutrons combined with protons to form the universe's deuterium and helium nuclei in a process called Big Bang nucleosynthesis (BBN).[35] Most protons remained uncombined as hydrogen nuclei.[36]

As the universe cooled, the rest energy density of matter came to gravitationally dominate that of the photon radiation. After about 379,000 years, the electrons and nuclei combined into atoms (mostly hydrogen), which were able to emit radiation. This relic radiation, which continued through space largely unimpeded, is known as the cosmic microwave background.[36]

Structure formation
Main article: Structure formation

Abell 2744 galaxy cluster – Hubble Frontier Fields view[37]
Over a long period of time, the slightly denser regions of the uniformly distributed matter gravitationally attracted nearby matter and thus grew even denser, forming gas clouds, stars, galaxies, and the other astronomical structures observable today.[1] The details of this process depend on the amount and type of matter in the universe. The four possible types of matter are known as cold dark matter (CDM), warm dark matter, hot dark matter, and baryonic matter. The best measurements available, from the Wilkinson Microwave Anisotropy Probe (WMAP), show that the data is well-fit by a Lambda-CDM model in which dark matter is assumed to be cold. (Warm dark matter is ruled out by early reionization.)[38] This CDM is estimated to make up about 23% of the matter/energy of the universe, while baryonic matter makes up about 4.6%.[39]


Cosmic acceleration
Main article: Accelerating expansion of the universe
Independent lines of evidence from Type Ia supernovae and the CMB imply that the universe today is dominated by a mysterious form of energy known as dark energy, which appears to homogeneously permeate all of space. Observations suggest that 73% of the total energy density of the present day universe is in this form. When the universe was very young it was likely infused with dark energy, but with everything closer together, gravity predominated, braking the expansion. Eventually, after billions of years of expansion, the declining density of matter relative to the density of dark energy allowed the expansion of the universe to begin to accelerate.[10]

Dark energy in its simplest formulation is modeled by a cosmological constant term in Einstein field equations of general relativity, but its composition and mechanism are unknown. More generally, the details of its equation of state and relationship with the Standard Model of particle physics continue to be investigated both through observation and theory.[10]

All of this cosmic evolution after the inflationary epoch can be rigorously described and modeled by the lambda-CDM model of cosmology, which uses the independent frameworks of quantum mechanics and general relativity. There are no easily testable models that would describe the situation prior to approximately 10−15 seconds.[41] Understanding this earliest of eras in the history of the universe is one of the greatest unsolved problems in physics.

Concept history
Main article: History of the Big Bang theory
See also: Timeline of cosmological theories
Etymology

The term itself has been argued to be a misnomer because it evokes an explosion.[44][52] The argument is that whereas an explosion suggests expansion into a surrounding space, the Big Bang only describes the intrinsic expansion of the contents of the universe.[53][54] Another issue pointed out by Santhosh Mathew is that bang implies sound, which is not an important feature of the model.[46] An attempt to find a more suitable alternative was not successful.[44][47]

Development
Hubble eXtreme Deep Field (XDF)

XDF size compared to the size of the Moon (XDF is the small box to the left of, and nearly below, the Moon) – several thousand galaxies, each consisting of billions of stars, are in this small view.

XDF (2012) view – each light speck is a galaxy – some of these are as old as 13.2 billion years[56] – the universe is estimated to contain 200 billion galaxies.

XDF image shows fully mature galaxies in the foreground plane – nearly mature galaxies from 5 to 9 billion years ago – protogalaxies, blazing with young stars, beyond 9 billion years.
The Big Bang models developed from observations of the structure of the universe and from theoretical considerations. In 1912, Vesto Slipher measured the first Doppler shift of a "spiral nebula" (spiral nebula is the obsolete term for spiral galaxies), and soon discovered that almost all such nebulae were receding from Earth. He did not grasp the cosmological implications of this fact, and indeed at the time it was highly controversial whether or not these nebulae were "island universes" outside our Milky Way.[57][58] Ten years later, Alexander Friedmann, a Russian cosmologist and mathematician, derived the Friedmann equations from the Einstein field equations, showing that the universe might be expanding in contrast to the static universe model advocated by Albert Einstein at that time.[59]

In 1924, American astronomer Edwin Hubble's measurement of the great distance to the nearest spiral nebulae showed that these systems were indeed other galaxies. Starting that same year, Hubble painstakingly developed a series of distance indicators, the forerunner of the cosmic distance ladder, using the 100-inch (2.5 m) Hooker telescope at Mount Wilson Observatory. This allowed him to estimate distances to galaxies whose redshifts had already been measured, mostly by Slipher. In 1929, Hubble discovered a correlation between distance and recessional velocity—now known as Hubble's law.[60][61]

Independently deriving Friedmann's equations in 1927, Georges Lemaître, a Belgian physicist and Roman Catholic priest, proposed that the recession of the nebulae was due to the expansion of the universe.[62] He inferred the relation that Hubble would later observe, given the cosmological principle.[10] In 1931, Lemaître went further and suggested that the evident expansion of the universe, if projected back in time, meant that the further in the past the smaller the universe was, until at some finite time in the past all the mass of the universe was concentrated into a single point, a "primeval atom" where and when the fabric of time and space came into existence.[63]


If the world has begun with a single quantum, the notions of space and time would altogether fail to have any meaning at the beginning; they would only begin to have a sensible meaning when the original quantum had been divided into a sufficient number of quanta. If this suggestion is correct, the beginning of the world happened a little before the beginning of space and time.[68]

During the 1930s, other ideas were proposed as non-standard cosmologies to explain Hubble's observations, including the Milne model,[69] the oscillatory universe (originally suggested by Friedmann, but advocated by Albert Einstein and Richard C. Tolman)[70] and Fritz Zwicky's tired light hypothesis.[71]

In 1968 and 1970, Roger Penrose, Stephen Hawking, and George F. R. Ellis published papers where they showed that mathematical singularities were an inevitable initial condition of relativistic models of the Big Bang.[76][77] Then, from the 1970s to the 1990s, cosmologists worked on characterizing the features of the Big Bang universe and resolving outstanding problems. In 1981, Alan Guth made a breakthrough in theoretical work on resolving certain outstanding theoretical problems in the Big Bang models with the introduction of an epoch of rapid expansion in the early universe he called "inflation".[78] Meanwhile, during these decades, two questions in observational cosmology that generated much discussion and disagreement were over the precise values of the Hubble Constant[79] and the matter-density of the universe (before the discovery of dark energy, thought to be the key predictor for the eventual fate of the universe).[80]

In the mid-1990s, observations of certain globular clusters appeared to indicate that they were about 15 billion years old, which conflicted with most then-current estimates of the age of the universe (and indeed with the age measured today). This issue was later resolved when new computer simulations, which included the effects of mass loss due to stellar winds, indicated a much younger age for globular clusters.[81]

Significant progress in Big Bang cosmology has been made since the late 1990s as a result of advances in telescope technology as well as the analysis of data from satellites such as the Cosmic Background Explorer (COBE),[82] the Hubble Space Telescope and WMAP.[83] Cosmologists now have fairly precise and accurate measurements of many of the parameters of the Big Bang model, and have made the unexpected discovery that the expansion of the universe appears to be accelerating.[84][85]

Observational evidence
"[The] big bang picture is too firmly grounded in data from every area to be proved invalid in its general features."

— Lawrence Krauss[86]

The earliest and most direct observational evidence of the validity of the theory are the expansion of the universe according to Hubble's law (as indicated by the redshifts of galaxies), discovery and measurement of the cosmic microwave background and the relative abundances of light elements produced by Big Bang nucleosynthesis (BBN). More recent evidence includes observations of galaxy formation and evolution, and the distribution of large-scale cosmic structures,[87] These are sometimes called the "four pillars" of the Big Bang models.[88]

Precise modern models of the Big Bang appeal to various exotic physical phenomena that have not been observed in terrestrial laboratory experiments or incorporated into the Standard Model of particle physics. Of these features, dark matter is currently the subject of most active laboratory investigations.[89] Remaining issues include the cuspy halo problem[90] and the dwarf galaxy problem[91] of cold dark matter. Dark energy is also an area of intense interest for scientists, but it is not clear whether direct detection of dark energy will be possible.[92] Inflation and baryogenesis remain more speculative features of current Big Bang models. Viable, quantitative explanations for such phenomena are still being sought. These are unsolved problems in physics.


Hubble's law and the expansion of the universe
Main articles: Hubble's law and Expansion of the universe
See also: Distance measures (cosmology) and Scale factor (cosmology)

Hubble's law implies that the universe is uniformly expanding everywhere. This cosmic expansion was predicted from general relativity by Friedmann in 1922[59] and Lemaître in 1927,[62] well before Hubble made his 1929 analysis and observations, and it remains the cornerstone of the Big Bang model as developed by Friedmann, Lemaître, Robertson, and Walker.


An unexplained discrepancy with the determination of the Hubble constant is known as Hubble tension. Techniques based on observation of the CMB suggest a lower value of this constant compared to the quantity derived from measurements based on the cosmic distance ladder.[94]

Cosmic microwave background radiation
Main article: Cosmic microwave background

The cosmic microwave background spectrum measured by the FIRAS instrument on the COBE satellite is the most-precisely measured blackbody spectrum in nature.[95] The data points and error bars on this graph are obscured by the theoretical curve.
In 1964, Arno Penzias and Robert Wilson serendipitously discovered the cosmic background radiation, an omnidirectional signal in the microwave band.[75] Their discovery provided substantial confirmation of the big-bang predictions by Alpher, Herman and Gamow around 1950. Through the 1970s, the radiation was found to be approximately consistent with a blackbody spectrum in all directions; this spectrum has been redshifted by the expansion of the universe, and today corresponds to approximately 2.725 K. This tipped the balance of evidence in favor of the Big Bang model, and Penzias and Wilson were awarded the 1978 Nobel Prize in Physics.

The surface of last scattering corresponding to emission of the CMB occurs shortly after recombination, the epoch when neutral hydrogen becomes stable. Prior to this, the universe comprised a hot dense photon-baryon plasma sea where photons were quickly scattered from free charged particles. Peaking at around 372±14 kyr,[38] the mean free path for a photon becomes long enough to reach the present day and the universe becomes transparent.


9 year WMAP image of the cosmic microwave background radiation (2012).[96][97] The radiation is isotropic to roughly one part in 100,000.[98]
In 1989, NASA launched COBE, which made two major advances: in 1990, high-precision spectrum measurements showed that the CMB frequency spectrum is an almost perfect blackbody with no deviations at a level of 1 part in 104, and measured a residual temperature of 2.726 K (more recent measurements have revised this figure down slightly to 2.7255 K); then in 1992, further COBE measurements discovered tiny fluctuations (anisotropies) in the CMB temperature across the sky, at a level of about one part in 105.[82] John C. Mather and George Smoot were awarded the 2006 Nobel Prize in Physics for their leadership in these results.

During the following decade, CMB anisotropies were further investigated by a large number of ground-based and balloon experiments. In 2000–2001, several experiments, most notably BOOMERanG, found the shape of the universe to be spatially almost flat by measuring the typical angular size (the size on the sky) of the anisotropies.[99][100][101]

In early 2003, the first results of the Wilkinson Microwave Anisotropy Probe were released, yielding what were at the time the most accurate values for some of the cosmological parameters. The results disproved several specific cosmic inflation models, but are consistent with the inflation theory in general.[83] The Planck space probe was launched in May 2009. Other ground and balloon-based cosmic microwave background experiments are ongoing.

Abundance of primordial elements
Main article: Big Bang nucleosynthesis
Using Big Bang models, it is possible to calculate the expected concentration of the isotopes helium-4 (4He), helium-3 (3He), deuterium (2H), and lithium-7 (7Li) in the universe as ratios to the amount of ordinary hydrogen.[35] The relative abundances depend on a single parameter, the ratio of photons to baryons. This value can be calculated independently from the detailed structure of CMB fluctuations. The ratios predicted (by mass, not by abundance) are about 0.25 for 4He:H, about 10−3 for 2H:H, about 10−4 for 3He:H, and about 10−9 for 7Li:H.[35]

The measured abundances all agree at least roughly with those predicted from a single value of the baryon-to-photon ratio. The agreement is excellent for deuterium, close but formally discrepant for 4He, and off by a factor of two for 7Li (this anomaly is known as the cosmological lithium problem); in the latter two cases, there are substantial systematic uncertainties. Nonetheless, the general consistency with abundances predicted by BBN is strong evidence for the Big Bang, as the theory is the only known explanation for the relative abundances of light elements, and it is virtually impossible to "tune" the Big Bang to produce much more or less than 20–30% helium.[102] Indeed, there is no obvious reason outside of the Big Bang that, for example, the young universe before star formation, as determined by studying matter supposedly free of stellar nucleosynthesis products, should have more helium than deuterium or more deuterium than 3He, and in constant ratios, too.[103]: 182–185

Galactic evolution and distribution
Main articles: Galaxy formation and evolution and Structure formation
Detailed observations of the morphology and distribution of galaxies and quasars are in agreement with the current Big Bang models. A combination of observations and theory suggest that the first quasars and galaxies formed within a billion years after the Big Bang,[104] and since then, larger structures have been forming, such as galaxy clusters and superclusters.[105]

Populations of stars have been aging and evolving, so that distant galaxies (which are observed as they were in the early universe) appear very different from nearby galaxies (observed in a more recent state). Moreover, galaxies that formed relatively recently, appear markedly different from galaxies formed at similar distances but shortly after the Big Bang. These observations are strong arguments against the steady-state model. Observations of star formation, galaxy and quasar distributions and larger structures, agree well with Big Bang simulations of the formation of structure in the universe, and are helping to complete details of the theory.[105][106]

Primordial gas clouds

Focal plane of BICEP2 telescope under a microscope – used to search for polarization in the CMB[107][108][109][110]
In 2011, astronomers found what they believe to be pristine clouds of primordial gas by analyzing absorption lines in the spectra of distant quasars. Before this discovery, all other astronomical objects have been observed to contain heavy elements that are formed in stars. Despite being sensitive to carbon, oxygen, and silicon, these three elements were not detected in these two clouds.[111][112] Since the clouds of gas have no detectable levels of heavy elements, they likely formed in the first few minutes after the Big Bang, during BBN.

Other lines of evidence
The age of the universe as estimated from the Hubble expansion and the CMB is now in agreement with other estimates using the ages of the oldest stars, both as measured by applying the theory of stellar evolution to globular clusters and through radiometric dating of individual Population II stars.[113] It is also in agreement with age estimates based on measurements of the expansion using Type Ia supernovae and measurements of temperature fluctuations in the cosmic microwave background.[23] The agreement of independent measurements of this age supports the Lambda-CDM (ΛCDM) model, since the model is used to relate some of the measurements to an age estimate, and all estimates turn agree. Still, some observations of objects from the relatively early universe (in particular quasar APM 08279+5255) raise concern as to whether these objects had enough time to form so early in the ΛCDM model.[114][115]

The prediction that the CMB temperature was higher in the past has been experimentally supported by observations of very low temperature absorption lines in gas clouds at high redshift.[116] This prediction also implies that the amplitude of the Sunyaev–Zel'dovich effect in clusters of galaxies does not depend directly on redshift. Observations have found this to be roughly true, but this effect depends on cluster properties that do change with cosmic time, making precise measurements difficult.[117][118]

Future observations
Future gravitational-wave observatories might be able to detect primordial gravitational waves, relics of the early universe, up to less than a second after the Big Bang.[119][120]

Problems and related issues in physics
See also: List of unsolved problems in physics
As with any theory, a number of mysteries and problems have arisen as a result of the development of the Big Bang models. Some of these mysteries and problems have been resolved while others are still outstanding. Proposed solutions to some of the problems in the Big Bang model have revealed new mysteries of their own. For example, the horizon problem, the magnetic monopole problem, and the flatness problem are most commonly resolved with inflation theory, but the details of the inflationary universe are still left unresolved and many, including some founders of the theory, say it has been disproven.[121][122][123][124] What follows are a list of the mysterious aspects of the Big Bang concept still under intense investigation by cosmologists and astrophysicists.

Baryon asymmetry
Main article: Baryon asymmetry
It is not yet understood why the universe has more matter than antimatter.[32] It is generally assumed that when the universe was young and very hot it was in statistical equilibrium and contained equal numbers of baryons and antibaryons. However, observations suggest that the universe, including its most distant parts, is made almost entirely of normal matter, rather than antimatter. A process called baryogenesis was hypothesized to account for the asymmetry. For baryogenesis to occur, the Sakharov conditions must be satisfied. These require that baryon number is not conserved, that C-symmetry and CP-symmetry are violated and that the universe depart from thermodynamic equilibrium.[125] All these conditions occur in the Standard Model, but the effects are not strong enough to explain the present baryon asymmetry.

Dark energy
Main article: Dark energy
Measurements of the redshift–magnitude relation for type Ia supernovae indicate that the expansion of the universe has been accelerating since the universe was about half its present age. To explain this acceleration, general relativity requires that much of the energy in the universe consists of a component with large negative pressure, dubbed "dark energy".[10]

Dark energy, though speculative, solves numerous problems. Measurements of the cosmic microwave background indicate that the universe is very nearly spatially flat, and therefore according to general relativity the universe must have almost exactly the critical density of mass/energy. But the mass density of the universe can be measured from its gravitational clustering, and is found to have only about 30% of the critical density.[10] Since theory suggests that dark energy does not cluster in the usual way it is the best explanation for the "missing" energy density. Dark energy also helps to explain two geometrical measures of the overall curvature of the universe, one using the frequency of gravitational lenses,[126] and the other using the characteristic pattern of the large-scale structure--baryon acoustic oscillations--as a cosmic ruler.[127][128]

Negative pressure is believed to be a property of vacuum energy, but the exact nature and existence of dark energy remains one of the great mysteries of the Big Bang. Results from the WMAP team in 2008 are in accordance with a universe that consists of 73% dark energy, 23% dark matter, 4.6% regular matter and less than 1% neutrinos.[39] According to theory, the energy density in matter decreases with the expansion of the universe, but the dark energy density remains constant (or nearly so) as the universe expands. Therefore, matter made up a larger fraction of the total energy of the universe in the past than it does today, but its fractional contribution will fall in the far future as dark energy becomes even more dominant.[citation needed]

The dark energy component of the universe has been explained by theorists using a variety of competing theories including Einstein's cosmological constant but also extending to more exotic forms of quintessence or other modified gravity schemes.[129] A cosmological constant problem, sometimes called the "most embarrassing problem in physics", results from the apparent discrepancy between the measured energy density of dark energy, and the one naively predicted from Planck units.[130]

Dark matter
Main article: Dark matter

Chart shows the proportion of different components of the universe  – about 95% is dark matter and dark energy.
During the 1970s and the 1980s, various observations showed that there is not sufficient visible matter in the universe to account for the apparent strength of gravitational forces within and between galaxies. This led to the idea that up to 90% of the matter in the universe is dark matter that does not emit light or interact with normal baryonic matter. In addition, the assumption that the universe is mostly normal matter led to predictions that were strongly inconsistent with observations. In particular, the universe today is far more lumpy and contains far less deuterium than can be accounted for without dark matter. While dark matter has always been controversial, it is inferred by various observations: the anisotropies in the CMB, galaxy cluster velocity dispersions, large-scale structure distributions, gravitational lensing studies, and X-ray measurements of galaxy clusters.[131]

Indirect evidence for dark matter comes from its gravitational influence on other matter, as no dark matter particles have been observed in laboratories. Many particle physics candidates for dark matter have been proposed, and several projects to detect them directly are underway.[132]

Additionally, there are outstanding problems associated with the currently favored cold dark matter model which include the dwarf galaxy problem[91] and the cuspy halo problem.[90] Alternative theories have been proposed that do not require a large amount of undetected matter, but instead modify the laws of gravity established by Newton and Einstein; yet no alternative theory has been as successful as the cold dark matter proposal in explaining all extant observations.[133]

Horizon problem
Main article: Horizon problem
The horizon problem results from the premise that information cannot travel faster than light. In a universe of finite age this sets a limit—the particle horizon—on the separation of any two regions of space that are in causal contact.[134] The observed isotropy of the CMB is problematic in this regard: if the universe had been dominated by radiation or matter at all times up to the epoch of last scattering, the particle horizon at that time would correspond to about 2 degrees on the sky. There would then be no mechanism to cause wider regions to have the same temperature.[103]: 191–202

A resolution to this apparent inconsistency is offered by inflation theory in which a homogeneous and isotropic scalar energy field dominates the universe at some very early period (before baryogenesis). During inflation, the universe undergoes exponential expansion, and the particle horizon expands much more rapidly than previously assumed, so that regions presently on opposite sides of the observable universe are well inside each other's particle horizon. The observed isotropy of the CMB then follows from the fact that this larger region was in causal contact before the beginning of inflation.[28]: 180–186

Heisenberg's uncertainty principle predicts that during the inflationary phase there would be quantum thermal fluctuations, which would be magnified to a cosmic scale. These fluctuations served as the seeds for all the current structures in the universe.[103]: 207  Inflation predicts that the primordial fluctuations are nearly scale invariant and Gaussian, which has been confirmed by measurements of the CMB.[83]: sec 6

A related issue to the classic horizon problem arises because in most standard cosmological inflation models, inflation ceases well before electroweak symmetry breaking occurs, so inflation should not be able to prevent large-scale discontinuities in the electroweak vacuum since distant parts of the observable universe were causally separate when the electroweak epoch ended.[135]

Magnetic monopoles
The magnetic monopole objection was raised in the late 1970s. Grand unified theories (GUTs) predicted topological defects in space that would manifest as magnetic monopoles. These objects would be produced efficiently in the hot early universe, resulting in a density much higher than is consistent with observations, given that no monopoles have been found. This problem is resolved by cosmic inflation, which removes all point defects from the observable universe, in the same way that it drives the geometry to flatness.[134]

Flatness problem

The overall geometry of the universe is determined by whether the Omega cosmological parameter is less than, equal to or greater than 1. Shown from top to bottom are a closed universe with positive curvature, a hyperbolic universe with negative curvature and a flat universe with zero curvature.
The flatness problem (also known as the oldness problem) is an observational problem associated with a FLRW.[134] The universe may have positive, negative, or zero spatial curvature depending on its total energy density. Curvature is negative if its density is less than the critical density; positive if greater; and zero at the critical density, in which case space is said to be flat. Observations indicate the universe is consistent with being flat.[136][137]

The problem is that any small departure from the critical density grows with time, and yet the universe today remains very close to flat.[notes 4] Given that a natural timescale for departure from flatness might be the Planck time, 10−43 seconds,[1] the fact that the universe has reached neither a heat death nor a Big Crunch after billions of years requires an explanation. For instance, even at the relatively late age of a few minutes (the time of nucleosynthesis), the density of the universe must have been within one part in 1014 of its critical value, or it would not exist as it does today.[138]

Misconceptions
One of the common misconceptions about the Big Bang model is that it fully explains the origin of the universe. However, the Big Bang model does not describe how energy, time, and space were caused, but rather it describes the emergence of the present universe from an ultra-dense and high-temperature initial state.[139] It is misleading to visualize the Big Bang by comparing its size to everyday objects. When the size of the universe at Big Bang is described, it refers to the size of the observable universe, and not the entire universe.[140]

Another common misconception is that the Big Bang must be understood as the expansion of space and not in terms of the contents of space exploding apart. In fact, either description can be accurate. The expansion of space (implied by the FLRW metric) is only a mathematical convention, corresponding to a choice of coordinates on spacetime. There is no generally covariant sense in which space expands.[141]

The recession speeds associated with Hubble's law are not velocities in a relativistic sense (for example, they are not related to the spatial components of 4-velocities). Therefore, it is not remarkable that according to Hubble's law, galaxies farther than the Hubble distance recede faster than the speed of light. Such recession speeds do not correspond to faster-than-light travel.

Many popular accounts attribute the cosmological redshift to the expansion of space. This can be misleading because the expansion of space is only a coordinate choice. The most natural interpretation of the cosmological redshift is that it is a Doppler shift.[93]

Implications
Given current understanding, scientific extrapolations about the future of the universe are only possible for finite durations, albeit for much longer periods than the current age of the universe. Anything beyond that becomes increasingly speculative. Likewise, at present, a proper understanding of the origin of the universe can only be subject to conjecture.[142]

Pre–Big Bang cosmology
The Big Bang explains the evolution of the universe from a starting density and temperature that is well beyond humanity's capability to replicate, so extrapolations to the most extreme conditions and earliest times are necessarily more speculative. Lemaître called this initial state the "primeval atom" while Gamow called the material "ylem". How the initial state of the universe originated is still an open question, but the Big Bang model does constrain some of its characteristics. For example, if specific laws of nature were to come to existence in a random way, inflation models show, some combinations of these are far more probable,[143] partly explaining why our Universe is rather stable. Another possible explanation for the stability of the Universe could be a hypothetical multiverse, which assumes every possible universe to exist, and thinking species could only emerge in those stable enough.[144] A flat universe implies a balance between gravitational potential energy and other energy forms, requiring no additional energy to be created.[136][137]

The Big Bang theory, built upon the equations of classical general relativity, indicates a singularity at the origin of cosmic time, and such an infinite energy density may be a physical impossibility. However, the physical theories of general relativity and quantum mechanics as currently realized are not applicable before the Planck epoch, and correcting this will require the development of a correct treatment of quantum gravity.[19] Certain quantum gravity treatments, such as the Wheeler–DeWitt equation, imply that time itself could be an emergent property.[145] As such, physics may conclude that time did not exist before the Big Bang.[146][147]

While it is not known what could have preceded the hot dense state of the early universe or how and why it originated, or even whether such questions are sensible, speculation abounds on the subject of "cosmogony".

Some speculative proposals in this regard, each of which entails untested hypotheses, are:

The simplest models, in which the Big Bang was caused by quantum fluctuations. That scenario had very little chance of happening, but, according to the totalitarian principle, even the most improbable event will eventually happen. It took place instantly, in our perspective, due to the absence of perceived time before the Big Bang.[148][149][150][151]
Emergent Universe models, which feature a low-activity past-eternal era before the Big Bang, resembling ancient ideas of a cosmic egg and birth of the world out of primordial chaos.
Models in which the whole of spacetime is finite, including the Hartle–Hawking no-boundary condition. For these cases, the Big Bang does represent the limit of time but without a singularity.[152] In such a case, the universe is self-sufficient.[153]
Brane cosmology models, in which inflation is due to the movement of branes in string theory; the pre-Big Bang model; the ekpyrotic model, in which the Big Bang is the result of a collision between branes; and the cyclic model, a variant of the ekpyrotic model in which collisions occur periodically. In the latter model the Big Bang was preceded by a Big Crunch and the universe cycles from one process to the other.[154][155][156][157]
Eternal inflation, in which universal inflation ends locally here and there in a random fashion, each end-point leading to a bubble universe, expanding from its own big bang.[158][159]
Proposals in the last two categories see the Big Bang as an event in either a much larger and older universe or in a multiverse.

Ultimate fate of the universe
Main article: Ultimate fate of the universe
Before observations of dark energy, cosmologists considered two scenarios for the future of the universe. If the mass density of the universe were greater than the critical density, then the universe would reach a maximum size and then begin to collapse. It would become denser and hotter again, ending with a state similar to that in which it started—a Big Crunch.[17]

Alternatively, if the density in the universe were equal to or below the critical density, the expansion would slow down but never stop. Star formation would cease with the consumption of interstellar gas in each galaxy; stars would burn out, leaving white dwarfs, neutron stars, and black holes. Collisions between these would result in mass accumulating into larger and larger black holes. The average temperature of the universe would very gradually asymptotically approach absolute zero—a Big Freeze.[160] Moreover, if protons are unstable, then baryonic matter would disappear, leaving only radiation and black holes. Eventually, black holes would evaporate by emitting Hawking radiation. The entropy of the universe would increase to the point where no organized form of energy could be extracted from it, a scenario known as heat death.[161]

Modern observations of accelerating expansion imply that more and more of the currently visible universe will pass beyond our event horizon and out of contact with us. The eventual result is not known. The ΛCDM model of the universe contains dark energy in the form of a cosmological constant. This theory suggests that only gravitationally bound systems, such as galaxies, will remain together, and they too will be subject to heat death as the universe expands and cools. Other explanations of dark energy, called phantom energy theories, suggest that ultimately galaxy clusters, stars, planets, atoms, nuclei, and matter itself will be torn apart by the ever-increasing expansion in a so-called Big Rip.[162]

"""

In [6]:
num_lines = len(base_text.split('\n'))
num_lines

print('Nmber of tokens in text: ', str(len(qwen_2_1_5b_tokenizer(base_text)['input_ids'])))

Nmber of tokens in text:  10646


###3. Tests

We are now ready to conduct our tests.

We will define:

1) then how much (*keep_fraction_main_text*) of the main text we want to keep, controlling the size of the context

2) where (*rel_insert_pos_info*) we want to insert the core information ('Fred Hoyle coined the name "Big Bang"')

3) where (*rel_insert_pos_rel*) we want to insert the supporting info ('Fred Hoyle preferred the Steady-State model')


But first we'll define a function that will create the context when given these parameters:

In [7]:
def generate_context(keep_fraction_main_text,
                     rel_insert_pos_info,
                     rel_insert_pos_rel
                     ):

  max_num_lines = 1500


  num_selected_lines = int(keep_fraction_main_text * min(max_num_lines, num_lines))

  info_pos = min(int(rel_insert_pos_info * num_selected_lines), num_selected_lines)
  ref_pos = min(int(rel_insert_pos_rel * num_selected_lines), num_selected_lines)

  if info_pos <= ref_pos:
    insert_pos_1, insert_pos_2 = info_pos, ref_pos
    text_section_1, text_section_2 = info_text, relation_text
  else:
    insert_pos_1, insert_pos_2 = ref_pos, info_pos
    text_section_1, text_section_2 = relation_text, info_text


  selected_text = '\n'.join(base_text.split('\n')[:insert_pos_1]) + '\n' + text_section_1 + '\n'.join(base_text.split('\n')[insert_pos_1:insert_pos_2]) + '\n' + text_section_2 + '\n'.join(base_text.split('\n')[insert_pos_2:num_selected_lines])


  full_text = intro_text  + selected_text + outro_text + question_text

  mistral_7b_messages = [
      {"role": "system", "content": "You are a helpful assistant who is very good in finding and summarizing relevant information in a text."},
      {"role": "user", "content": full_text}]

  encodeds = qwen_2_1_5b_tokenizer.apply_chat_template(mistral_7b_messages, return_tensors="pt")

  model_inputs = encodeds.to("cuda")

  base_length = encodeds.shape[1]

  return full_text, model_inputs, base_length


Now let's look at the answer (we should repeat a few times as the asnswers fluctuate!):

In [8]:
## Set up the parameters, then do the runs

# ratio of how much of the text to keep in context
# suggested variations: 0.1, 0.2, 0.5, 1.0
keep_fraction_main_text = 1.0


# where will the main info ~'Fred Hoyle coined the term Big Bang' be inserted
# suggested variations: 0.01, 0.5, 0.9
rel_insert_pos_info = 0.01

# where will the other info about Fred Hoyle favoring the steady-state model be inserted
# suggested variations: 0.011, 0.51, 0.91 - also try close to main info vs far away
rel_insert_pos_rel = 0.911

# if you want uncommentb and change the question
#question_text = """Based on the text above who coined the term 'Big Bang' (not the model itself)? And which other cosmological model did that person prefer? Give a short answer covering both questions"""



#######

qwen_2_1_5b_inst_4bit.eval()


# use the generation function to generate the context information
full_text, model_inputs, base_length = generate_context(keep_fraction_main_text,
                                                       rel_insert_pos_info,
                                                       rel_insert_pos_rel)

num_iterations = 3

print('Original question: ', question_text)
print()
print('Setup: ')
print('\tContext length: ', base_length)
print('\tRelative position of core information:', rel_insert_pos_info)
print('\tRelative position of supporting information:', rel_insert_pos_rel)

for iteration in range(num_iterations):

  print('\nIteration:', str(iteration + 1))
  generated_ids = qwen_2_1_5b_inst_4bit.generate(model_inputs, max_new_tokens=200, do_sample=True)
  decoded = qwen_2_1_5b_tokenizer.batch_decode(generated_ids[:, base_length:])
  print('Generated Answer: \n',  decoded[0])


The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Original question:  Based on the text above who coined the term 'Big Bang' (not the model itself)? And which other cosmological model did that person prefer? Give a short answer covering both questions

Setup: 
	Context length:  10932
	Relative position of core information: 0.01
	Relative position of supporting information: 0.911

Iteration: 1
Generated Answer: 
 <|im_start|>assistant
Fred Hoyle coined the term 'Big Bang'. He preferred the steady-state cosmological model.<|im_end|>

Iteration: 2
Generated Answer: 
 <|im_start|>assistant
English astronomer Fred Hoyle coined the term 'Big Bang'. Hoyle preferred the steady-state cosmological model.<|im_end|>

Iteration: 3
Generated Answer: 
 <|im_start|>assistant
English astronomer Fred Hoyle is credited with coining the term "Big Bang" during a talk for a March 1949 BBC Radio broadcast.[42] Hoyle originally preferred a model called the steady-state model, but it was rejected by the community of cosmologists.<|im_end|>


Hmm... not bad. It depends on the model. Play with more situations and other models!


Also, **when an A100 is available** we should use this notebook to go > 10k context length if memory allows it. How well does that work?