Updated index.rst

shrinivaasanka · May 3, 2023 · 0cfd06f · 0cfd06f · shrinivaasanka · May 3, 2023
1 parent 65c2dfc
commit 0cfd06f
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/index.rst b/index.rst
@@ -525,7 +525,7 @@ Last row of the tableau always ends in 1 and right column is of the form 115 * (
 Computing Merit Score for earlier CNF formulation of Multiple-Choice question-answering is a MAXSAT problem (finding maximum number of satisfiable clauses or correctly answered questions) which is NP-Hard. Factoid Question-Answering (questions and answers in short sentences based on fact keywords) in terms of its complexity lies between Multiple-Choice Q&A and Open-ended Q&A chatbots(e.g IBM Watson's Jeopardy Q&A - https://www.nytimes.com/2021/07/16/technology/what-happened-ibm-watson.html , ChatGPT - GPT3 - https://arxiv.org/pdf/2005.14165.pdf) - https://web.stanford.edu/~jurafsky/slp3/14.pdf describes a Lambda calculus-Logical formula meaning representation of factoid question-answering - Fig 14.11 SQUAD dataset, Fig 14.14 multiple logical meaning representations of Q&A (Recursive Lambda Function Growth algorithm in NeuronRain is a graph theoretic meaning representation algorithm for texts based on beta-reduction of Lambda calculus). An open-ended Question-Answering could be reduced to earlier 4-choice CNF MAXSAT format by grading the answers in 4 ranges of satisfaction percentages: 0-25%,25-50%,50-75%,75-100% (Grading in academics fit into this multiple choice CNF of grade variables - A+,A,B+,B,C,D,E,F,...). STEM multiple choice Question-Answering admission tests on the other hand would not depend on corpus queries but instead on theorem provers and equation solvers and are non-trivial AI problems. Combining upper and lower bounds from [Harsha-Klivans-Mekha] and [O'Donnell-Servedio] average sensitivity of CNF Question-Answering is lowerbounded by Omega(n^(1−1/(4(n^1/3*logn^2d/3)+6))). Degree of a monomial in PTF roughly corresponds to difficulty of a question for that monomial. CNF format of Multiple-Choice question-answering could be learnt in Mistake-Bound Learning model in Ω(n^[n^1/3*logn^2d/3]) and O(n^[n^1/3*logn]) combining earlier results on PTF degree of CNF which is phenomenally hard perhaps hinting at hardness of framing questions for answer samplespace (or answer-questioning) in reallife examinations and fall in the category of O(n^n) or O(n!) complexity problems (e.g Travelling Salesman Problem is O(n!)). It is worth contrasting techniques of Polynomial Interpolations over Reals and Learning a Concept class in Boolean setting which is an interpolation of a Polynomial Threshold Function or Linear Threshold Function over GF(2): Barycentric Interpolation of learning a polynomial over reals (https://people.maths.ox.ac.uk/trefethen/barycentric.pdf) is of linear time complexity and Question-Answering could be even represented as set of ordered pairs of (question(i),score_for_answer_to_question(i)) over reals and learnt by a polynomial interpolation algorithm. Statistical Query Dimension is defined (due to [Blum]) as minimum number of statistical queries required to learn a Boolean concept class. Halfspace intersections which are intersections of halfspaces over arbitrary dimensions are known to formalize any convex set (https://www.cs.umd.edu/class/spring2020/cmsc754/Lects/lect06-duality.pdf) and Question-Answering could be written in terms of halfspace intersections by reduction: exact answer(i) to question(i) is a line separating a plane and any deviating answer(i) to question(i) is either in upper halfplane or lower halfplane and intersection of halfplanes for all questions and answers represents a transcript - in other words sign of polynomial is ternary: + and - correspond to wrong answer halfplanes while 0 is exact answer line. By a result due to [Klivans-Sherstov] - Unconditional Lower Bounds for Learning Intersections of Halfspaces - https://www.cs.utexas.edu/~klivans/mlj07-sq.pdf - any statistical-query algorithm for learning the intersection of √n halfspaces in n dimensions must make 2^Ω(√n) queries implying exponential lowerbound of Question-Answering in Boolean setting. The exponential separation between polynomial interpolation over Reals (R) and GF(2) or Z2 is counterintuitive while conventional wisdom would suggest the contrary. Another contradiction between polynomial interpolation in 1 variable over reals and Boolean concept class mistake bound learning arises from Theorem 1: "If all c ∈ C have PTF degree d then C is learnable in the Mistake Bound model in time and mistake bound n^O(d)" of https://www.cs.utexas.edu/~klivans/f07lec5.pdf - again real univariate polynomial interpolation is exponentially faster than boolean degree-d PTF learning. Contradiction could be perhaps reconciled by the fact that univariate real polynomials correspond to univariate boolean functions and multivariate boolean PTFs and LTFs must be matched with multivariate polynomial interpolations. More generic Polynomial interpolation over several variables is a difficult problem involving Haar spaces - Definition 1.1 - http://pcmap.unizar.es/~gasca/investig/GSSurvey.pdf. In the context of admission tests, Polynomial interpolation over reals of Question-Answering transcript (set of ordered pairs of the form [question(i),score_for_answer_to_question(i)]) creates a family of polynomials for set of candidates ranked and ranking could be by any distance measure between polynomials including total score - it is obvious to note that integral of the polynomial (area under polynomial) per candidate is the total score per candidate. In NeuronRain, Complexity upper and lower bounds of Interviews-Contests-Examinations (Problem of People intrinsic merit or Talent analytics) have been investigated through earlier multiple theoretical models of Question-Answering: 1) QBFSAT 2) Linear Threshold Function 3) Polynomial Threshold Function 4) CNFSAT 5) Polynomial Interpolation over Reals 6) Halfspace intersections. Answers to Multiple Choice Questions which are words or phrases in most cases, could be embedded on a vectorspace (e.g Word2Vec) and Halfspace intersections could be defined over those word embeddings by unique straightlines passing through correct answer choice word vector to each question which intersect to form a convex polytope separating interior and exterior vector halfspaces of wrong answer choices. There is a formal language theoretic facet to Question-Answering as opposed to statistical solutions - Every natural language question could be parsed by a mildly context sensitive tree adjoining grammar to get a parse tree meaning representation and its answer could be another tree adjoining grammar production rule parse tree from which natural language sentences could be generated. How an answer TAG parse tree is obtained from a question TAG parse tree is subjective and one choice is to traverse the keyword vertices of question TAG tree (Verb phrase,Noun phrase) and replace them by a query result from a corpus e.g Noun phrase "biggest (adjective) city (noun1) in the world (noun2)" in question TAG is replaced by query result for biggest city (from corpus) in answer TAG. Multiple TAG parsing algorithms for natural language sentences have been published (Earley-type,LR-type,CYK-type) having best and worst case time complexity of O(n^6) - https://www.cs.helsinki.fi/group/xmltools/treelang/tagintro.pdf , [Yves Schabes, Aravind Joshi] - Figure 3: Trees selected by: "The men who hate women that smoke cigarettes are intolerant" - https://repository.upenn.edu/cgi/viewcontent.cgi?article=1574&context=cis_reports , [Vijay-Shanker, Aravind Joshi] - Section 2.2 - simple linguistic examples - tree adjunctions - https://aclanthology.org/H86-1020.pdf . Complexity of Recursive Gloss Overlap textgraph algorithm has been analyzed in sections 191 and 201 of NeuronRain Design : For W keywords in the document time bound is O(W*x^(2d)) for (number of keywords=W, depth=d and average size of gloss definition=x) entire text input while TAG trees have to be parsed for each natural language sentence in the text or O(n^6*S) for S sentences in text. Both TAG trees and Recursive Lambda Function Growth are graph theoretic and formal language meaning representation algorithms and parse one dimensional flat text into trees and graphs. ChatGPT Question-Answering Bot is in complexity class TC=AC=SAC=NC being a humongous neural network threshold circuit and TAG parsing of mildly context sensitive natural languages is O(n^6) or in P both implying Question-Answering AI is in NC or P. LSTM and Convolution Neural Networks have been employed for faster TAG parsing - End-to-end Graph-based TAG Parsing with Neural Networks - [Yale and ElementalCognition] - https://arxiv.org/pdf/1804.06610v1.pdf 
 	64. Mining patterns in Astronomy Datasets has been less studied in BigData - NeuronRain (originally intended to be an astronomy software) brings astronomy and cosmology datasets (Ephemeris data of celestial bodies, Gravitational pull and Correlation of Celestial N-Body choreographies to terrestrial extreme weather events, Climate analytics, Satellite weather GIS imagery, Space Telescope Deep Field Imagery of Cosmos) into machine learning and artificial intelligence mainstream. For example, Red-Green-Blue channel histogram analysis of Hubble Ultra Deep Field in NeuronRain seems to show an anamoly in percentage of Red (Farthest-Redshift), Green(Farther), Blue(Far) galaxies - ratio of Red:Green:Blue galaxies are 3:1:2 while intuition would suggest the contrary 1:2:3 (Deep Field is a light cone search and Red-Blue-Green channels of Deep Field are circular intersections of the light cone at different time points of past. As galaxies would appear more spread out proportionate to distance in expanding spacetime, Red-Green-Blue circular disks should theoretically contain increasing number of galaxies in order Red < Green < Blue). Possibly this contradiction could be explained by Einstein Field Equations - https://en.wikipedia.org/wiki/Einstein_field_equations - accounting for per body spacetime curvature and light cone of deep field is warped. Example Python RGB Analysis and Histogram plots of Hubble eXtreme Deep Field (2012) imagery is documented in https://scientific-python.readthedocs.io/en/latest/notebooks_rst/5_Image_Processing/02_Examples/Image_Processing_Tutorial_3.html
         65. Problem of text restoration in archaeology pertains to reconstruction of ancient damaged manuscripts with missing text (in redacted version) e.g Dead sea scrolls - https://www2.cs.uky.edu/dri/dead-sea-scrolls/ , https://www.deadseascrolls.org.il/featured-scrolls . Traditionally scripts have been stored with an associated Unicode-ASCII value which is sufficient for deciphered natural languages. Undeciphered inscriptions and texts in manuscript could be stored as polynomials (one polynomial per symbol defining the shape of the script) which facilitates algebraic and topological text restoration by interpolating missing fragments of a text by homeomorphic deformations, polynomial interpolation or polynomial reconstruction. Polynomial Reconstruction Problem is defined as (from https://eprint.iacr.org/2004/217.pdf): "Definition 1 Polynomial Reconstruction (PR) - Given a set of points over a finite field {zi , yi} i=1 to n, and parameters [n, k, w], recover all polynomials p of degree less than k such that p(zi) != yi for at most w distinct indexes i ∈ {1, . . . , n}. .....". Problem of text restoration is exactly the problem of polynomial reconstruction (or) recover all contour polynomials p of degree less than k for damaged symbols such that p(zi) != yi for at most w distinct indices (number w could be a fraction of missing fragment of a symbol polynomial). Vowelless text (de)compression is a text restoration problem wherein missing vowels in compressed text have to be accurately reconstructed and transformers could serve as vowelless text decompressors by rephrasing missing word inference problem in Figure 1 of https://www.amacad.org/sites/default/files/publication/downloads/Daedalus_Sp22_09_Manning.pdf as missing vowel inference problem. DeepLearning frameworks have been demonstrated for text restoration (Decree by Acropolis of Athens - 485 BC - https://github.com/deepmind/ithaca/blob/main/images/inscription.png). In algebraic terms, damaged symbols in the manuscript are piecewise discontinuous polynomials which are smoothed to a continuous polynomial by looking up a tabular map of symbols-to-polynomials,retrieving the best matching fragment and splicing it onto damaged symbol polynomial. 
-        66. Following are related in the sense how each area of research (algebra,geometry and topology) views and constructs a polynomial curve passing through set of points: Polynomial Reconstruction Problem, Polynomial Interpolation, Four Bar Linkage-Alt's Problem-Coupler curves-Nine point synthesis in algebraic geometry, Path Homotopy H connecting two functions F(x) and G(x) and tracking the continuous deformations from F(x) to G(x) defined by H(x;t) = tF(x) + (1 − t)G(x) in Numeric Algebraic Geometry - https://en.wikipedia.org/wiki/Numerical_algebraic_geometry. Homotopy formalizes the realworld computer graphics problem of morphing one image to the other which could be visuals of human face,handwriting or fingerprint (or) continuously deforming contour polynomials of image1 to those of image2 by a parameter. This makes homotopy and indispensable tool to topologically recognize visual similarities while conventional literature devotes much to stochastic machine learning side of them e.g face recognition is dominated by deep learning CNNs - in topological terms extent of homotopic deformation required to morph one visual to the other defines the distance between them (often denoted as Homotopy equivalence - https://en.wikipedia.org/wiki/Homotopy#Homotopy_equivalence , Homotopy Fundamental Group - https://en.wikipedia.org/wiki/Fundamental_group - quantifies how many polynomials of a topological space can be deformed to one another). 
+        66. Following are related in the sense how each area of research (algebra,geometry and topology) views and constructs a polynomial curve passing through set of points: Polynomial Reconstruction Problem, Polynomial Interpolation, Four Bar Linkage-Alt's Problem-Coupler curves-Nine point synthesis in algebraic geometry, Path Homotopy H connecting two functions F(x) and G(x) and tracking the continuous deformations from F(x) to G(x) defined by H(x;t) = tF(x) + (1 − t)G(x) in Numeric Algebraic Geometry - https://en.wikipedia.org/wiki/Numerical_algebraic_geometry. Homotopy formalizes the realworld computer graphics problem of morphing one image to the other which could be visuals of human face,handwriting or fingerprint (or) continuously deforming contour polynomials of image1 to those of image2 by a parameter. This makes homotopy an indispensable tool to topologically recognize visual similarities while conventional literature devotes much to stochastic machine learning side of them e.g face recognition is dominated by deep learning CNNs - in topological terms extent of homotopic deformation required to morph one visual to the other (Homotopic morphing of Planar Curves - https://www.wisdom.weizmann.ac.il/~ylipman/2015_homotopic_morphing.pdf) defines the distance between them (often denoted as Homotopy equivalence - https://en.wikipedia.org/wiki/Homotopy#Homotopy_equivalence , Homotopy Fundamental Group - https://en.wikipedia.org/wiki/Fundamental_group - quantifies how many polynomials of a topological space can be deformed to one another). 
 
 
 **What are some unusual applications of Factorization implemented in NeuronRain?**