Heuristic analysis of BigBird-CT model generated embeddings similarities to further assess the model.

N.B. Strongly recommended to run this Notebook on a GPU

In [3]:
from pathlib import Path
import pandas as pd
from sentence_transformers import SentenceTransformer, util
import numpy as np


pd.set_option('max_colwidth', None)

# load full testing data
test_path = Path.cwd().parent.joinpath('data/interim/test_unlabelled.pkl')
test = pd.read_pickle(test_path)

# load our fine-tuned BigBird-CT with in-batch negatives model
model_bigbird_ct_path = Path.cwd().parent.joinpath('models/bigbird-ct')
model = SentenceTransformer(model_bigbird_ct_path)

sentences = test['Concatenated'].tolist()
codes = test['ModuleCode'].tolist()

# get document embeddings for our testing set modules
embeddings = model.encode(sentences,
                          batch_size = 16,
                          show_progress_bar = True)

Batches:   0%|          | 0/54 [00:00<?, ?it/s]

Attention type 'block_sparse' is not possible if sequence_length: 676 <= num global tokens: 2 * config.block_size + min. num sliding tokens: 3 * config.block_size + config.num_random_blocks * config.block_size + additional buffer: config.num_random_blocks * config.block_size = 704 with config.block_size = 64, config.num_random_blocks = 3. Changing attention type to 'original_full'...


In addition to having measured the Spearman Rank Correlation Coefficient of this models cosine similarities with the labelled data, we will now inspect several model-generated similarities and contrast them with the actual passages of text. This serves as a further, heuristic, evaluation of the model performance.

In [4]:
# find the cosine similarity matrix for the embeddings
cos_sim = util.cos_sim(embeddings, embeddings)

# find the range of cosine similarity values
cos_sim_min = cos_sim.numpy().min()
cos_sim_max = cos_sim.numpy().max()
print(f'Range of cosine similarities: [{cos_sim_min}, {cos_sim_max}]')

Range of cosine similarities: [-0.1474636048078537, 1.0000005960464478]


We have cosine similarities ranging from approximately -0.15 to 1. What is important for a cosine similarity is where it sits relative to the range -0.15 to 1; as long as, say, cosine similarities of 0.6 are generally that between more highly semantically similar documents than documents with cosine similarities of 0.4 (and so on for other values), we can be assured our model has acceptable performance. In other words, the range of cosine similarity values are not important, their positions with respect to each other are.

We do note that our upper bound is above 1, which should not mathematically be possible for a cosine similarity. This is likely due to floating point precision and is not an issue; we can consider this upper bound, effectively, to be exactly 1. (This is easily verifiable from manually inspecting documents that share a cosine similarity of this upper bound, as they are for exactly identical documents.)

We now consider a sample of cosine similarities, specifically one comparison each of cosine similarities closest to $$\{-0.2, 0, 0.2, 0.4, 0.6, 0.8, 1\}.$$

In theory, we expect each cosine similarity to be between increasingly similar passages of text, as the cosine similarity increases. Mathematically, we might describe this as a monotonic relationship.

In [5]:
# add all pairs to a list, with their cosine similarity score
all_sentence_combinations = []
for i in range(len(cos_sim) - 1):
    for j in range(i + 1, len(cos_sim)):
        all_sentence_combinations.append([cos_sim[i][j], i, j])

# sort list in order of descending cosine similarity
all_sentence_combinations = sorted(all_sentence_combinations, key = lambda x: x[0], reverse = True)
all_sentence_combinations = np.array(all_sentence_combinations)

# get increasingly higher cosine similarity comparisons
comparisons = pd.DataFrame(columns = ['ModuleCode_A', 'Document_A', 'ModuleCode_B', 'Document_B', 'Cosine Similarity'])
for similarity in range(-1, 6):
    similarity /= 5
    index = np.absolute(all_sentence_combinations[:, 0] - similarity).argmin()
    score, i, j = all_sentence_combinations[index]
    i = int(i)
    j = int(j)
    comparisons.loc[len(comparisons)] = [codes[i], sentences[i], codes[j], sentences[j], float(cos_sim[i][j])]

comparisons

Unnamed: 0,ModuleCode_A,Document_A,ModuleCode_B,Document_B,Cosine Similarity
0,[HIS0002],semester two substitute for stage his capped special subject.,[GEO2043],"this is an important module for students taking the ba (hons) geography programmes. the module introduces students to the range and diversity of research methods used in human geography and in the social sciences more broadly. human geographers engage with social, cultural, economic and political life, and human geography as a discipline is rooted in empirically informed, conceptually focused research which enables those social, cultural, economic and political relations to be explored. the aim of this module is to provide students with a solid understanding of the diverse array of research methods and techniques that are used to collect data and conduct analysis and how these might work together in order to address research questions and prove or disprove hypotheses. the module’s core aims are: to introduce students to the diverse range of research methods used across human geography. to explore both conceptual and practical issues in research methodology and the use of research techniques. to draw connections between learning about tools and techniques for research on this module, and the production of research findings as explored through the range of modules offered across the geography programme. to make explicit connection between research methods in human geography and the development of graduate level transferable skills. to give students the confidence to proceed with original data collection and analysis for the dissertation at stage 3. finding research questions; perspectives on the research process. research design quantitative techniques and statistical analysis visualisation of data and gis interviews focus groups creative methods participant observation visual methodologies research diaries textual and document analysis coding and analysis of qualitative data mixed methods research ethics by the end of the module, students will have knowledge and understanding of the strengths and limitations of a range of research methods used in contemporary human geography and across the social sciences. students will be able to select an appropriate technique for a given research design or type of research question. students will understand the distinctions between primary and secondary data, extensive and intensive research designs, qualitative and quantitative data and analysis, positionality and the ethics of research. students will understand the issues of rigour and validity in the research process. by the end of the module, students will be able to identify a range of types of data (numeric, textual, experiential, verbal, visual) and will understand issues around the collection and analysis of these different data types. students will be able to undertake collection and analysis of different types of data, including numeric, visual, experiential, verbal and textual data, or understand the principles and practices behind data collection and analysis. students will have developed their transferable or graduate level skills, particularly in critical thinking and the synthesis of ideas, written and oral communication, participation in small group discussions, problem solving, information literacy, numeracy, self management and time management.",-0.147464
1,[MCH3074],"this module examines historical and contemporary issues and debates as they pertain to media, communication, and culture. analysis is guided by critical approaches developed in media and cultural studies. this module also expands the critical and theoretical understandings developed by students in culture, communications and the media in previous stages. the module aims to enable students to interrogate and develop their thinking and practices in relation to theoretical and issues based challenges. it is intended to help students to work on identifying and determining what is and is not critical detail and undertake critical self reflexivity. it will encourage students to pay attention to texture, argument, criticality, subtlety, and complexity. such aims support their final dissertations. the module draws on relevant theoretical texts and cultural practices to explore and comprehend culture, media and technology in the context of historical and recent developments in the field of media, communication and cultural studies. each week provides students with a detailed and critical understanding of a theorist, concept, medium, theory, text, practice, or event from a range of perspectives. the module encourages students to consider their own involvement in the matters at hand, and to be critically self reflexive about this. the module will thus facilitate a space in which to thoughtfully consider our identities and interests in relation to the demands of large but also intimate public spheres, challenges, and global contexts. assignments are structured around studies of social movements, media, events or cultural formations (e.g. class, ethnicity, gender). alternatively, students can focus on a particular direction in a theory’s or theorist’s trajectory. a student successfully completing the module will have knowledge of: the role and significance of theory and theorised accounts of culture, communications and media; major thinkers in the area, and their theoretical approaches; key concepts explaining the organization of meanings, understandings and subjectivity. a student successfully completing the module will be able to: engage critically with the key themes and issues that underpin the study of media, communication and cultural studies closely analyse, interpret and apply critical judgement in the understanding and the evaluation of concepts. gather, organise and deploy ideas to formulate arguments in effective written and oral forms. consider and evaluate their own work in a reflexive manner, with reference to academic issues and debates.",[HIS0002],semester two substitute for stage his capped special subject.,-4.4e-05
2,[TCP8902],"to critically discuss ethics and their relation to planning practice; to review ideas about responsibility, professional duty and professionalism, with particular regard to planning; to critically review the ways in which planners contribute their knowledge, skills and values in different contexts; to critically review the organizational contexts in which planning is practised. introduction : twenty first century town planning and ideas of reflexive practice roles : what planners do in public, private and voluntary work settings clients : who and what are planners working for? ethics : consequentialist, deontological and virtue ethical schools power and corruption in planning kindness and compassion in planning planning work and workplaces careers and organisational 'fit' the planning profession conclusion: becoming reflexive practitioners to develop an appreciation of the ethical dimensions of planning practice across organisational settings; to develop a greater awareness and critical understanding of the importance of values in planning; an ability to relate ethical issues to practice situations in planning; an ability to write a critical and logical discussion on these issues.",[CHY1110],"the aims of this module are to introduce students to the basic principles of organic chemistry; the basic concepts of organic reaction mechanisms, including the use of curly arrows; the structure and reactivity of the common functional groups, and the chemistry of biologically important classes of organic molecules. throughout the module, students will be introduced to: basic principles of organic chemistry nomenclature and chemical representations: drawing and naming molecules electronic structure: orbitals and hybridisation, delocalization and conjugation reaction mechanism: curly arrows, equilibria and reaction rates shapes of molecules: stereochemistry, isomerism, conformation and cyclic compounds carbonyls and carboxylic acids carbonyl functional groups. carbonyl structure: shape and electronic configuration, relative energies and shapes of the molecular orbitals, understanding reactions in terms of simple orbital interactions reactions at the carbonyl group: nucleophilic addition; nucleophilic substitution; nucleophilic substitution with loss of carbonyl oxygen formation and reactions of enols and enolates introduction to functional group chemistry review of key concepts nucleophilic substitution: sn1 and sn2 elimination reactions: e1 and e2 addition to alkenes reduction oxidation molecules of life amino acids, peptides and proteins nucleosides and nucleotides dna and rna at the end of this module students will: understand the concept of molecular orbitals, hybridisation and the relevance to molecular shape and reactivity have an appreciation of the chemistry of carbonyl functional groups, including the mechanisms and common reagents used for nucleophilic additions, substitutions and enol/enolate chemistry. understand the mechanisms and stereochemical course of sn1, sn2, e1 and e2 reactions and the influence of substrate structure and reaction conditions on selectivity in these processes. appreciate the chemistry of alcohols, alkyl halides, and alkenes. have an understanding of the basic structure and reactivity of the different classes of 'molecules of life' and their context in living systems. at the end of this module a student will be able to: draw organic structures in the correct skeletal form with realistic bond angles visualise the 3d structure of molecules from the 2d representation on paper assign configuration of stereochemistry (e/z, r/s) represent reaction mechanism with the correct use of curly arrows apply the fundamental principles delivered in the module to related, unseen situations",0.200004
3,[NBS8615],"the aim of this module is to provide an overview of the methods used in experimental economics and their applications. students will learn to critically assess why economists may benefit from conducting experiments and what insights they can provide. they will also be introduced to seminal papers in the field. this module will also equip students with the necessary knowledge and tools to design and conduct their own economic experiments, in line with best practices and conventions. the following list contains an indicative set of topics: introduction o rationale for the experimental method o design principles taxonomy of experiments o lab experiments o field experiments seminal market experiments o competitive market equilibrium o information aggregation measuring preferences o elicitation methods auctions experimental games practical considerations in experimental research o pre registration and replicability o econometrics for experimental data by the end of the module, students should be able to demonstrate: an ability to critically assess experimental literature an awareness of the strengths and limitations of the experimental approach in economics. by the end of the module students should be able to: design a new experiment to answer an economic research question conduct experimental sessions analyse experimental data report and critically assess experimental results",[SPG8007],"the aim of the module is to provide students with comprehensive and systematic knowledge to support the integration and use of hydrogen and a circular economy in the industrial sector. in particular, the module will focus on the opportunities for using hydrogen, how it is produced and its use in fuel cell technology. the module provides essential material for the circular and hydrogen economies and how fuel cell technology plays a role, including biological fuel cells. opportunities for using hydrogen; hydrogen in a circular economy; hydrogen production / generation, storage and distribution of hydrogen as a fuel; future potential methods for generating hydrogen based on renewable energy or fuels. appropriate physical chemistry: units, thermodynamics, kinetics and basic electrochemistry. reactor engineering (introduction to chemical engineering, material and energy balances, heat transfer separation processes and process design) sustainable technologies for a circular economy. power sources within a circular economy model hydrogen reforming technology hydrogen storage hydrogen production hydrogen economy biological fuel cell, chemical fuel cell and batteries technology fit in a circular economy. it is anticipated that students who successfully complete this module will be able to: demonstrate an understanding on key technologies that support a circular economy as part of sustainable development demonstrate a knowledge and understanding of hydrogen systems, storage, production and its application in fuel cells. using appropriate insight, be able to develop and design appropriate hydrogen energy systems for use with fuel cell systems. understand the definition and applications of biological fuel cells. at the end of this module students should be able to: demonstrate a critical understanding of chemical engineering concepts applied to hydrogen and circular economy systems. demonstrate a critical understanding of theoretical concepts and practical implementation associated with hydrogen in energy systems. the ability to apply design tools for power systems using hydrogen and/or a circular energy models. they should also possess the following cognitive skills: the ability to collate, analyse and evaluate data associated with the selection and design of fuel cell systems. the ability to critically analyse and select power source systems. the ability to identify and solve problems, produce and appraise solutions to generate energy systems that follow a circular economy model. the ability to apply design tools for electrochemical, hydrogen power systems. they should also possess the following transferable skills: problem solving skills applied to open ended hydrogen system design. communication skills developed through assessed work and class discussions. the ability to plan and manage study time particularly in pre and post school periods. use of a variety of it skills.",0.4
4,[SPE1051],"this module aims to provide an introduction to research methods and the formulation of research questions. the module is taken by both bsc and i am students. the bsc students will combine this course with the research methods in practice course in year 2, enabling students to understand the application of research methods to the clinical contexts in which they will be working using evidence based principles. the i am students will combine years and with more advanced study leading to the dissertation in years and throughout the course, learning is cumulative, one year building on another, with concepts recurring and developing over the course. the main objective of this course is to help you understand the principles of experimental design and provide you with an introduction to statistics. we hope that after you complete this course you will be able to understand how a diverse range of research methods are employed to collect data, to analyse data systematically, to describe data faithfully, to formulate hypotheses and use data to evaluate those hypotheses. the concepts, knowledge and skills this course introduces are also fundamental to just about every other course you will follow: you will need material from this course for your child study, for all your clinical placement case studies and case presentations throughout your programme. you will need to have learned all the material to make any sense of the psychology courses, child development modules, speech language pathology courses. every article you read, test you evaluate, treatment you plan, diagnosis you propose, prognosis you make, essay you write will require recourse to concepts, facts and techniques which you will start to learn about in this course. the course will be delivered as a mixture of lectures, lab sessions and participation in research studies. for students studying the clinical programmes (bsc speech & language therapy and masters of speech & language sciences), the hcpc standards of proficiency are of relevance. 14.12 recognise the value of research to the critical evaluation of practice 14.13 be aware of a range of research methodologies syllabus covers: introduction to the course. different sorts of enquiry and research methods. evidence based practice as a framework, population vs. samples. descriptive vs. inferential statistics. measures of central tendency: mean, median, mode. measures of dispersion: ranges, mean deviation, standard deviation; standard error. frequency distributions. normal distribution and its properties. standard scores: z scores, t scores, percentiles. parametric vs. nonparametric statistics. types of data: nominal, ordinal, interval, ratio. basic experimental designs: between subjects and within subjects. dependent and independent variables. hypotheses and tails. hypothesis testing: probability, significance, alpha levels, type i and ii errors. introduction to inferential statistics. choosing tests. lab: introduction to spss. nonparametric tests for one and two samples nominal data. lab: nonparametric tests for comparing two samples ordinal data. lab: nonparametric tests for comparing three or more samples. lab: parametric tests for comparing one, two and three samples: t tests, one way analysis of variance. lab: correlation measures: tests of association and relation. revision for the semester and preparation for class test. semester and semester completion of a total of three hours of research work as a participant, from a range of experimental research studies available (or completion of a word essay on a subject connected with research design or ethics to be agreed with the module leader). for students studying the clinical programmes (bsc speech & language therapy and masters of speech & language sciences), relevant aspects of rcslt curriculum guidelines: this module contributes to the key graduate capabilities around research and evidence based practice (4.2.4), with a focus on section b research skills and methods allowing the students to demonstrate the knowledge and skills required to understand, interpret and apply research to practice. knowledge of the principles underlying experimental design knowledge of experimental and correlational techniques, including conceptual foundations (ii) knowledge of when it is appropriate to use these, and (iii) potential pitfalls. ability to create an experimental design to address a specific hypothesis. ability to prepare data ready for a statistical analysis ability to use a range of statistical techniques to analyse data (non parametric and parametric statistics using with and between group design) using appropriate statistical software (e.g. spss)","[BMN1004, CMB1005]","this module aims to: provide a range of both practical laboratory skills and generic study skills essential to students studying biomolecular and biomedical sciences. provide opportunities for students to apply and strengthen theoretical knowledge gained in complementary and co requisite modules in the performance of key analytical techniques and interpretation of data generated. develop safe laboratory practice the module is structured into four strands, three that align with the co requisite theoretical modules and a generic skills strand. the generic skills strand consists of: basic biology knowledge assessment; basic chemistry knowledge assessment, introductory maths skills assessment and support seminars; study skills seminars; lectures on good academic practical and scientific writing (essays and lab reports); information retrieval exercise; basic data analysis; and introductory laboratory skills. the biochemistry practical skills strand consists of laboratory based practicals on spectrophotometry, ion exchange chromatography and enzyme kinetics, as well as online practical material covering the control of gene transcription. the cell biology practical skills strand consists of laboratory based practicals on osmosis, ph and buffers, and neuromuscular function, as well as online practical material covering the microscopic observation of unicellular eukaryotes. the genetics practical skills strand consists of laboratory based practicals on the genetic transformation of e. coli, use of pcr/electrophoresis for genetic analysis, as well as a computer based practical covering gene linkage. online material will supplement practical skills as formative pre and post practical quizzes. on successful completion of this module students will be able to: explain the underlying principles of a number of essential practical techniques used to investigate cell biology, genetics and biochemistry. describe health and safety precautions which need to be taken when working in cell biology, genetics and biochemistry laboratories. describe mathematical formulae necessary for calculation of concentrations and dilutions relevant to cell biology, genetics and biochemistry practical work. on successful completion of this module students will be able to: carry out a number of essential practical techniques used to investigate cell biology, genetics and biochemistry. work safely in cell biology, genetics and biochemistry laboratories (legal awareness). record and analyse biological data from experiments in cell biology, genetics and biochemistry (critical thinking, data synthesis and numeracy). carry out scientific calculations relevant to cell biology, genetics and biochemistry practical work, including calculation of concentrations and dilutions (data synthesis and numeracy, problem solving). use computer applications to explore concepts in cell biology, genetics and biochemistry practical work (use of computer application and online materials). independently search literature for information on a cell biology topic (information literacy, independence). write a structured laboratory report on a practical associated with the module (literacy, synthesis and present materials, written communication skills).",0.599999
5,[MUS8017],"this module aims to provide a platform for postgraduate students in sacs to choose a single specialist musical practice from practical projects across a range of different fields. this may include indian classical music, free improvisation, composing folk music, and others offered on a year by year basis. students will be advised about which topics are running in any academic year during the registration period. students will meet throughout the year for scheduled workshops with key creative practice staff and occasional visiting experts. students will have chosen one specialist field of practical study during the registration period each academic year. topics on offer may include indian classical music (beginners or intermediate), free improvisation, baroque opera, contemporary opera, and folk composition or other specialist fields. syllabus includes scheduled workshop/seminars with key staff in the delivery of the particular specialist fields on offer in any academic year. a development of knowledge in the chosen specialist area. the development of practical skills in one specialist field of musical practice. the ability to work creatively within a specialist field of musical practice. the ability to evidence knowledge and development in a specialist field of musical practice.","[MUS8012, MUS8013, MUS8014, MUS8015]","this module aims to provide a platform for postgraduate students in sacs to develop their own creative musical composition (across a range of genres) and to receive useful feedback from staff about the development of their creative practice throughout the year. students will meet throughout the year for scheduled workshops with key performance staff in music and occasional visiting experts where they present their own performance research and receive individual feedback for development. syllabus includes scheduled workshop/seminars with key staff in music across the year, with follow up tutorials for students. syllabus is student led in that they bring their own performance research for iterative development across the year. all students benefit from observation and participation in workshops with key teaching staff. advanced awareness of the creative potential of musical/sonic composition. advanced understanding of how practice research unfolds in the development of musical composition. development of an understanding of how to communicate practice research in music. development of the understanding of how to document and present the research process in practice research. awareness of how others implement research as practice in musical composition. ability to present a multimodal argument that demonstrates the research elements of composition, and how the notion of research as practice is manifested in their own, and others’ musical composition. enhanced ability to undertake independent research to produce original musical composition research. improved musical compositional ability. enhanced ability to communicate the research process in practice research.",0.800044
6,[HSS3220],"this is a dummy module is for administrative purposes only, and should only be selected as a placeholder for off regulation choices during the pre registration period.",[HSS2110],"this is a dummy module is for administrative purposes only, and should only be selected as a placeholder for off regulation choices during the pre registration period.",1.0


We begin by writing some comments about the similarities between the documents described within each cosine similarity rating.

| Cosine Similarity | Comments                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| -0.147464         | Document A is very short, referring to a dummy module. Document B is much longer, and refers to a very specific module, one that is specifically for the BA (Hons) Geography programmes.<br>Document B refers to many specific concepts, such as data analysis, ethics of research and human geography, whereas Document A is very vague.<br>These documents are highly dissimilar.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| -0.000044         | Document A is quite long, referring to concepts within media, communication and culture. Document B is very short, referring to a dummy module, interestingly the same one as in cosine similarity -0.147464.<br>Document A refers to many specific concepts, such as cultural studies, accounts of culture and historical developments, whereas Document B is very vague.<br>These documents are highly dissimilar.                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| 0.200004          | Document A discusses ethics within planning practice. Document B refers to organic chemistry.<br>Document A mentions more general concepts and questions, whereas Document B refers to specific concepts within organic chemistry (e.g. nucleophilic addition).<br>These documents are dissimilar as the documents refer to different topics, Document A describes more general ethical themes, and Document B refers to specific scientific concepts.                                                                                                                                                                                                                                                                                                                                                                                                                |
| 0.400000          | Document A refers to experimental economics. Document B refers to hydrogen use within the industrial sector.<br>Document A discusses specific scientific concepts within economics, such as market equilibrium, assessing experimental results and assess experimental literature. Document B refers to how hydrogen is produced and used as fuel, mentioning physical chemistry, reactor engineer, heat transfer, among many others.<br>These documents are generally different, but have their similarities as they both discuss domain-specific scientific methods within their respective disciplines.                                                                                                                                                                                                                                                            |
| 0.599999          | Document A refers to research methods and questions within the clinical context. Document B refers to practical laboratory and study skills, in the biomedical/biomolecular context.<br>Document A is generally on the subject of understanding experimental design and introducing statistics, and mentions that this is for the purpose of applying these methods to psychology, speech language pathology and child development. Document B generally describes developing analytical techniques, data understanding and practical skills, with reference of applying this to spectrophotometry, ion exchange chromatography, enzyme kinetics, among other areas.<br>These documents are related as they both describe learning general scientific and quantitative skills to be applied to their respective domains, with these respective domains being related. |
| 0.800044          | Document A discusses postgraduate students choosing a specialist musical practice for practical study. Document B discusses postgraduate students developing their own creative musical composition.<br>Both documents are within the musical context, refer to scheduled workshops, working with visiting experts, among others. Their key differences is that one deals with musical practice, and the other with developing a musical composition.<br>These documents are highly related as they both refer to the same general area, module and learning structure, but focusing on different techniques within music.                                                                                                                                                                                                                                            |
| 1.000000          | Document A and Document B are both very short, mentioning that they are dummy modules.<br>Document A and Document B are both identical.<br>These documents are semantically, and indeed lexicographically, identical.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |

Hence, looking each cosine similarity, we see that as the cosine similarity increases, the documents within the pairings are increasingly semantically similar. By this, it's reasonable to say that our modelling pipeline has generated document embeddings representative of the semantic content of the documents within our corpus.

The only exception to this could arguably be the documents within cosine similarities -0.147464 and -0.000044, with them both being roughly arguably as dissimilar to each other, however this doesn't mean that the documents within similarity -0.000044 are more dissimilar than those within similarity -0.147464 and thus the monotonic relationship is not broken.

We now do the same as the above, but instead looking at cosine similarities closest to $$\{-0.1, 0.1, 0.3, 0.5, 0.7, 0.9\},$$ to serve as further investigation into the relationship between the evaluated cosine similarities and semantic similarity of the documents.

In [6]:
# get increasingly higher cosine similarity comparisons, for the alternative set of similarities
comparisons_2 = pd.DataFrame(columns = ['ModuleCode_A', 'Document_A', 'ModuleCode_B', 'Document_B', 'Cosine Similarity'])
for similarity in range(-1, 5):
    similarity = similarity / 5 + 0.1
    index = np.absolute(all_sentence_combinations[:, 0] - similarity).argmin()
    score, i, j = all_sentence_combinations[index]
    i = int(i)
    j = int(j)
    comparisons_2.loc[len(comparisons_2)] = [codes[i], sentences[i], codes[j], sentences[j], float(cos_sim[i][j])]

comparisons_2

Unnamed: 0,ModuleCode_A,Document_A,ModuleCode_B,Document_B,Cosine Similarity
0,[HIS0001],semester one substitute for stage three his capped modules.,[SEL3420],"the aim of this module is to analyse how ann radcliffe’s concept of the ‘explained supernatural’, and its debt to edmund burke’s philosophy of the sublime, shaped the evolution of fiction between the late eighteenth and late nineteenth centuries. students will develop knowledge of a range of canonical and non canonical texts, including examples of literature written by authors whose engagement with the supernatural is less well known. we will focus particularly on the process through which burke and radcliffe's theory of ‘terror’ illuminates contemporary anxieties surrounding gender, class, race and nationhood, and examine how these fears were transformed throughout the eighteenth century, romantic and victorian eras. students will gain a thorough knowledge of the historical and cultural contexts which inspired the rise of the supernatural in fiction. they will also combine this knowledge with some contemporary philosophies of the human mind, in order to interpret supernatural entities as manifestations of tyranny, repressed desire, and fear of ‘the other’. we will connect these historical and cultural changes with formal and generic developments in the literature of the period, paying particular attention to the way that victorian writers re imagined tropes such as the natural landscape, anti hero, and angel/whore dichotomy for a new age. the module will culminate by questioning the extent to which the sublime and ‘explained supernatural’ not only uncover cultural anxieties, but also promote education, reform and the toleration of difference as their most powerful remedy. the syllabus comprises a range of genres, including poetry, novels, and short stories written between and 1890. due to the philosophical and psychoanalytical themes of the module, it will also include some contemporary texts that illuminate the theoretical frameworks of the sublime and 'explained supernatural'. the syllabus may vary year to year, but key authors may include horace walpole, ann radcliffe, samuel taylor coleridge, lord byron, jane austen, john keats, john william polidori, emily brontë, joseph thomas sheridan le fanu, rudyard kipling and william butler yeats. to complete the module successfully, students will be required to demonstrate: a thorough understanding of the theoretical concepts of the sublime and 'explained supernatural'; an ability to apply this philosophical framework to the changing historical and cultural contexts in which the module texts were written and read; an understanding of the evolution of contemporary anxieties surrounding gender, class, race and nationhood; a detailed knowledge of the content and formal aspects of the set texts; an ability to evaluate contemporary and current critical positions on the set texts. by the end of the module, students should be able to: critically compare and contrast different texts and contexts; offer historically and theoretically informed insights into specific texts both orally and in writing; assimilate and evaluate information from a number of different literary, critical and contextual sources; bring a thorough understanding of the generic and formal aspects of the set texts to analysing module themes; produce with others, in small groups, notes and oral presentations; reflect upon their participation and essay writing skills via a combination of formative and summative assignments.",-0.099728
1,[TMP9001],created directly in sap to allow exchange students to be registered in to a community in blackboard lp july 2012,[MAS3913],"to achieve an understanding of linear models, and how regression, analysis of variance (anova) and analysis of covariance (ancova) models arise as special cases. to understand the problem of identifiability in anova, and the role played by parameter constraints and dummy variables in solving it. to achieve an understanding of generalized linear models and achieve familiarity with the most common families, understanding how logistic regression and log linear models arise as special cases. to understand asymptotic maximum likelihood theory for more than one parameter and its application to generalized linear models. to understand the exponential family, and demonstrate that certain distributions belong to this. module summary the first part of this module is concerned with building and applying statistical models for data. how does a mixture of quantitative and qualitative variables affect the body mass index of an individual? suppose we find an association between age and body mass index, how can we study if this association varies between men and women, or between those with different educational backgrounds? in this course we consider the issues involved when we wish to construct realistic and useful statistical models for problems which can arise in a range of fields: medicine, finance, social research and environmental issues being some of the main areas. we revise multiple linear regression models, and see how they are special cases of a general linear model. we move on to consider analysis of variance (anova) as another special case of a general linear model – this is the problem of investigating contrasts between different levels of a factor in affecting a response and then we generalize to the case of several factors. we consider analysis of covariance (ancova) which involves mixing linear regression and factor effects, and the idea of interaction between explanatory variables in the way they affect a response. the module provides a comprehensive introduction to the issues involved in using statistics to model real data, and to draw relevant conclusions. there is an emphasis on hands on application of the theory and methods throughout, with extensive use of r. the second part of the module builds on the first part (linear modelling) by introducing a generalized framework of models which allow us to generalize away from normally distributed errors to different kinds of random outcomes, building in an appropriate transformation of the linear function of explanatory variables to match. we note that the general linear models studied in the first part of the module exist as a special case. we generalize linear models to study the topic of generalized linear models, allowing us to build non linear relationships into our models, and to study many different types of outcome measure which could not have been handled using general linear models. we consider asymptotic maximum likelihood estimation for the multi parameter case, including the use of information matrices in parameter estimation and likelihood ratio tests for comparing nested models. these ideas are applied to generalized linear models. we study in depth the special cases involved with binomial outcomes, logistic regression, where we are interested in how explanatory variables affect the success rate, and then log linear models, which enable us to study among other things, contingency tables involving more than two factors. this module opens up the possibility to study many kinds of real life situations which were inaccessible to linear models, allowing us to study many realistic and important problems. there is an emphasis on hands on application of the theory and methods throughout, with extensive use of r. the general linear model: maximum likelihood in the multi parameter case; estimation of parameters; prediction; model adequacy; regression, anova and ancova as special cases. model choice. analysis of designs with 1, or factors. model identifiability, parameter constraints and dummy variables. use of transformations. various extended examples of statistical modelling using r. generalized linear models: overall construction as generalization of linear models; binomial regression with various links; poisson regression; log linear models and their use for contingency tables. asymptotic distribution of the maximum likelihood estimator in the multi parameter case. the exponential family of distributions. various extended examples of statistical modelling using r. students will know the theory and techniques of modelling normal outcomes in terms of categorical and continuous covariates using the general linear model. the students will extend their knowledge of the theory of general linear models to encompass outcomes from several non normal exponential family distributions. they will understand asymptotic maximum likelihood theory for multi parameter distributions, and the exponential family of distributions. the ability to determine the appropriate statistical model to use, to be able to use r to fit the model and to be able to interpret the fitted model. the ability to identify the kind of design and modelling approaches needed to address a wide variety of real life statistical problems, and the ability to implement appropriate statistical modelling procedures using r. the ability to identify the kind of modelling approaches needed to address a wide variety of real life statistical problems, and the ability to implement appropriate statistical modelling procedures using r.",0.099956
2,[ACC2025],"to examine current financial reporting practice and how it impacts on companies. to develop accounts preparation skills. preparation and presentation of published financial statements of limited companies in accordance with selected international accounting standards. by the end of the module, students should be able to: apply the requirements of company law and international financial reporting standards concerning the format and content of company financial statements. assess and compare the effects of accounting policy choice on reported income, net assets and capital by the end of the module, students should be able to: prepare published financial statements of limited companies in accordance with ifrs",[MMB8047],"this module will provide students with a core understanding of evolutionary theory and biology, designed specifically for students from diverse disciplinary backgrounds in behavioural sciences. the module will give students an understanding of the historical context of evolutionary theory through a brief history of evolutionary biology, and progress into a detailed understanding of the modern synthesis and associated concepts. finally, the course will engage the students in applying these concepts to human behaviour in particular, covering current research in behavioural ecology, evolutionary medicine and cultural evolution. as well as providing a primer for the core concepts of modern evolutionary theory and practice, the course situates evolutionary biology in a wider context, covering the historical development of evolutionary theory, its philosophical ramifications, and its impacts and applications in the human social, behavioral and health sciences. the module is primarily designed as a unifying thread for the mres strand in evolution and human behaviour, but students from other strands are welcome and will not be at a disadvantage. the module will start with an introductory session and then include material on the following topics: o principles of evolutionary biology o history of life and the hominid line o variation and heredity o competition and selection o population genetics o genetics and behaviour o cooperation, competition and game theory o plasticity and learning o language and communication o gene culture co evolution o cultural evolution o complex systems and collective behaviour on completing the module students should be able to: display a firm grasp of the intellectual history of evolutionary biology and what evolutionary theory aims to explain 2.discuss the mechanisms that give rise to adaptation and the biological evidence for evolutionary processes 3.explain the core the concepts in evolutionary theory and how these can be applied to the study of human and behaviour and culture on completion of the module students should be able to: communicate clearly and informedly about evolutionary theory in historical context, particularly as it applies to human behaviour, including contentious issues in this application critically and constructively evaluate research in the behavioural sciences in an evolutionary context apply evolutionary thinking to empirical studies of behaviour, e.g., in developing ideas for a research project.",0.299998
3,[CME3040],"this module allows students to study important chemical engineering principles in the laboratory. the experiments demonstrate important concepts from the fluids, heat transfer, thermodynamics, dynamics & control and mass transfer modules. laboratory projects: students will complete and write reports on an experiment from each of the following areas: transfer processes, control, reactor engineering, separation processes and heat exchangers. understanding of: principles and theories underlying process heat transfer, the concepts used to construct a plant measurement and control system, the principles underlying the design of drying processes, the measurement of equilibrium data and the differences in performance between reactors. they will be able to conduct safely experiments to assess the performance of process equipment, conduct experimental investigations into chemical engineering principles, prepare a technical report on the outcome of the practical work, analyse experimental results and calculate the errors associated with experimental measurement. general: oral presentation, report writing. through study of this module the students will enhance their general transferable skills in problem solving and presentation.",[NBS8630],"the aim of this blocked course is to prepare students for advanced studies in “mathematics for economics and finance”, “introductory econometrics”, and economic theory modules. this course aims at refreshing univariate and multivariate calculus. it also aims at introducing key concepts of linear algebra and illustrates the role matrices play in economics, and econometrics. real analysis (series and sequences, continuous functions, taylor’s theorem, mean and extreme value theorems) univariate and multivariate calculus algebra of matrices and linear mappings’, determinants, diagonalization, and canonical forms, and vector spaces probability functions, conditional probability, permutations and combinations, bayes rule continuous and discrete random variables, univariate and multivariate distributions, expectation and conditional expectation, moments sampling distributions, hypothesis testing, confidence intervals students will be able to manipulate equations and analyse properties of continuous and differentiable functions using calculus. students will have a good working knowledge of matrix algebra. student will understand key concepts of probability theory, random variables, and statistical inference. student will develop problem solving skills using calculus, real analysis, matrix algebra and statistics. students will be able to analyse economic problems using a variety of mathematical and statistical techniques and interpret the results.",0.500001
4,"[BUS2074, LBU2074]","the module is designed to promote an understanding of the nature and importance of the management of people and organisations in the context of the challenges faced by organisations operating in the global business environment. the module aims to enable students to understand the dimensions that influence management policies and practices internationally, and to gain awareness of how conceptual debates shape the management of people and organisations and translate into practical challenges in different countries. as a direct result of internationalisation and globalisation the scope and practice of management and decision making in organisations is wider and affected by multiple forces that transcend domestic realities, such as changes in demographics, patterns of migration, patterns of global staffing, global economic landscape, knowledge to attitudinal/behavioural characteristics ratio, trends in talent rotation across international business units, rise of knowledge workers, and changes in national and supranational policy frameworks. more concretely, the module aims to support students in: problematising the key concepts in the area of the management of people and organisations and how these are used to address issues in the global business environment. using key concepts to understand and make sense of people management experiences in organisations. scrutinising assumptions about the management of people and organisations. managing people and organisations in the global business environment: from local to global managing people in organisations: challenges, controversies and contradictions managing organisational practices: power, politics, ethics and decision making managing organisational structures and processes: towards a post bureaucratic organisational model by the end of the module the student will be able to: summarise implications of key issues affecting the management of people in organisations in the global business environment. explain differences and similarities in the choices and challenges organisations face for the management of people in different regions and countries. identify the impact of national institutions and systems on design and implementation of management policies and practices in different countries. demonstrate insight into the impact of globalisation on the management of people in organisations operating internationally. by the end of the module the student will be able to: analyse issues surrounding the different ways in which people and organizations can be managed recognise the importance of the management of people and organizations, and of the relationship between this and other aspects of management; both sufficient to have a material impact on the subsequent managerial behaviour of students. conduct critical analysis of current issues facing the management of people and organisations in the global business environment through the analysis, structure and management of ideas and arguments. demonstrate autonomy, accountability and teamwork skills through managing a group project, organising working time, working professionally with others and meet deadlines. demonstrate good communication through the preparation of i.e. a report,and presenting and communicating an argument to an audience through that report.",[NBS8385],"this module focuses on ways in which we think about employment and global hrm. this is important because how we think about any topic fundamentally frames how we ""make a difference"" as practitioners. the key purpose of the module is to provide students with different ways of thinking about hrm so students can develop the skills and knowledge needed to support the development of more humane workplaces. the topic introduced will look at contemporary issues employment and global hrm and how our thinking about these issues shapes our understanding of the challenges they pose to hrm practitioners. topics will include the changing structure of employers and global organisation of production; labour markets as ""special"" institutions; the employment of migrant workers; bodies at work; minds at work; and, the professionalisation of hrm. the syllabus may change from year to year to reflect current research. topics likely to be include, but are not limited to: thinking differently about global organisational structures thinking differently about labour market institutions thinking differently about migration, work, and employment thinking differently about bodies at work: aesthetic and emotional labour thinking differently about minds at work thinking differently about the professionalisation of the hr function use social theory to understanding of key issues within contemporary work and organizations, and how these relate to the practices of hrm. understand the relationship between diverse individuals, different workplace experiences and contemporary workplace settings. to understand contemporary social trends and the consequences of these for workers, organization and the possibilities of hrm critically discuss the role of power and social institutions in affecting social processes at work and how these impact hrm. explain and evaluate the relationship between individuals, institutions and wider societal structures. at the end of the module students should be able to evaluate and critically debate the relationship between individual, work, organisations and wider society, and the role of hrm within this context. they should be able to use social theory to understand and evaluate contemporary workplaces settings. they should understand the practical implication of contemporary social processes for the history, emergence and potential of hrm.",0.699997
5,[EEE2012],"to introduce students to topics in electrical engineering and control including polyphase circuits; polyphase synchronous machines; polyphase induction machines; three phase transformers; three phase distribution systems; electric drives; system modelling and simulation using matlab/simulink; analytical and numerical solutions of dynamic systems; transfer functions and time domain response; open loop and closed loop control; pid and root locus based control. introduction to dynamic systems and control: system modelling using ordinary differential equations odes; modelling of electrical, mechanical and electromechanical systems; introduction to the use of matlab/simulink in studying system dynamics. analytic and numerical solution methods of odes: first order, second order and higher order systems; analytic methods to solve odes; numerical solutions using matlab/simulink. transfer functions: laplace transform and s domain; transfer function; characteristic equation and order; pole zero map; damping and system stability; final value theorem; transfer functions in matlab. time domain characteristics: response of first order, second order and higher order systems to different types of input; time response characteristics of first and second order systems. time response using matlab/simulink. closed loop systems: feedback control; open loop and closed loop transfer function; system type and error constants; steady state error with inputs for different system types. root locus: simple root locus; root locus using matlab; graphical method; angle and magnitude conditions pid control: fundamental operation of pid control; tuning of pid controllers using trial and error, ziegler nichols i method, ziegler nichols ii method and root locus method. design based on root locus: effect of pole/zero placement; lead control; lag control; lead lag control. polyphase circuits: balanced phase circuits, phase phasor representations; summation of currents to zero; neutral; star and delta connections; polyphase synchronous machines: production of rotating field by balanced excitation of polyphase windings; concept of synchronous operation; theory of synchronous operation; theory of synchronous machine with uniform airgap. equivalent circuits, voltage equations, alternative sink and source conventions, typical phasor diagrams, electrical power as a function of load angle; operation characteristics with constant power and varying excitation, relationships to phasor diagrams; torque angle. polyphase induction machines: transition from synchronous to asynchronous operation; derivation of exact equivalent circuit; modification of exact equivalent circuit by application of thevenin’s theorem; relationships between 'rotor current', mechanical power, and torque; condition for torque to be maximum; effect of changing rotor resistance on current and torque characteristics. electrical drives: review of simple ac drives, phase pwm voltage source inverter; freewheel diode function, dead time requirements; analogue implementation of phase pwm, constant voltage per hertz control of an induction machine; basic pm synchronous motor drive. three phase transformers: phase transformer construction; star and delta connection and effect on primary / secondary voltages and currents; phase shifts; reflected impedance. three phase distribution systems: phase power system analysis; per unit system; reasons for per units; choice of base, single phase representation of balanced polyphase operation. understand basic theory of control engineering and the system behaviour when subjected to demanded signals. knowledge of three term control system compensation, pid. knowledge of system stability using time and frequency domain characteristics. knowledge of using cad packages in the analysis and design of dynamic control systems. an understanding of items of electrical power equipment and systems. an awareness of induction and synchronous ac machines, three phase transformers, and their construction and operation. a basic awareness of power systems and operation. ability to analyse control systems and ac power equipment and systems.",[EEE8154],"the module introduces students to these aspects of electric drive systems: structure, configuration, motor and load types, high performance control, and industrial applications. it aims to enhance the students’ understanding of electric drive systems and to show the relationship between the theoretical and practical aspects of the subject. basic drive configurations and load characteristics; two and four quadrant drives; dynamic braking and regeneration; constant torque and field weakening strategies; examples for high bandwidth torque requirement dc drives: dc motor modelling; state space models; use of h bridge for variable supply voltage; current ripple; electrical and mechanical system transfer functions; armature current and rotor speed control; cascade control structures; digital control basics; position measuring devices; step by step tuning method for proportional integral controllers for drives; additive disturbance rejection and steady state error ac drives: configurations of three phase power electronic converter; induction motor drive basics and open loop v/f control; space vector theory; three phase to two phase transformation; pm machine dynamic equations; torque control of brushless dc motor; reference frame transformation; vector control of permanent magnet synchronous motor; dynamic model of induction motor; rotor flux oriented vector control of induction motor drives; decoupled flux and torque control; torque control at high dynamics; indirect and direct rotor flux oriented vector control of induction machines; voltage space vector generation through a three phase power electronic converter; mathematical basis for space vector modulation; centre aligned pwm modulation strategy; phase duty cycle calculations advanced control concepts and computer simulation: dc motor simulation; unipolar and bipolar h bridge modulation for dc drives; digital current and speed control of a dc motor drive; three phase power electronic converter with rl load; space vectors, phase and reference frame transformations; modelling and control of a permanent magnet synchronous motor drive case study: study of a 24v digitally controlled drive system – electronics design and control software issues knowledge of both ac and dc electric drives systems and their control. an awareness of commonly employed power circuits and control systems. how power devices may be controlled through microprocessors. knowledge of advanced techniques in drive system analysis.. ability to analyse and design control systems for electric drives. ability to construct and critically test drive simulation studies",0.899483


We omit commenting about the documents featured in the comparisons here, but again we see an apparently monotonic relationship between the semantic similarity of the documents and the displayed cosine similarities.