Skip to content

kartheikiyer/chaotic_neural

Repository files navigation

chaotic_neural

Associative clustering and analysis of papers on the ArXiv

Documentation Status


This package aims at providing a trained model to find related papers on ArXiv given another paper or a set of keywords.

Read the documentation & try tutorials at chaotic-neural.readthedocs.io

It aims to be different from existing resources like the default ArXiv search, the new ADS, or ArXivsorter in that it uses Doc2Vec, an unsupervised algorithm that trains a shallow neural network to transform every document (in this case ArXiv abstracts) into a vector in a high-dimensional vector. Similar papers are then found by finding the closest vectors to one of interest in this space. This also allows for performing vector operations on keywords and documents to structure specific queries.

Users can either build their own model (by searching ArXiv with specific queries) or use the pre-trained model that has been trained on all the astro-ph.GA papers up to Sunday, May 23, 2021.

A live version of the tutorials can be found here on Google Colab.


You can load the pre-trained astro-ph-GA-23May2021 model using:

import chaotic_neural as cn
model_data = cn.load_trained_doc2vec_model('astro-ph-GA-23May2021')
model, all_titles, all_abstracts, all_authors, train_corpus, test_corpus = model_data

Having loaded the model, queries can now be run using ArXiv ids for papers as:

sims = cn.list_similar_papers(model_data, doc_id = 1903.10457, input_type='arxiv_id')

which returns

ArXiv id:  1903.10457
Title: Learning the Relationship between Galaxies Spectra and their Star
  Formation Histories using Convolutional Neural Networks and Cosmological
  Simulations
-----------------------------
Most similar/relevant papers: 
-----------------------------
0 Learning the Relationship between Galaxies Spectra and their Star
  Formation Histories using Convolutional Neural Networks and Cosmological
  Simulations  (Corrcoef: 0.98 )
1 A Critical Assessment of Photometric Redshift Methods: A CANDELS
  Investigation  (Corrcoef: 0.71 )
2 SHARDS: an optical spectro-photometric survey of distant galaxies  (Corrcoef: 0.66 )
3 Analysis of galaxy SEDs from far-UV to far-IR with CIGALE: Studying a
  SINGS test sample  (Corrcoef: 0.65 )
4 Probability density estimation of photometric redshifts based on machine
  learning  (Corrcoef: 0.65 )
5 Non-parametric Star Formation History Reconstruction with Gaussian
  Processes I: Counting Major Episodes of Star Formation  (Corrcoef: 0.63 )
6 On the recovery of galaxy properties from SED fitting solutions  (Corrcoef: 0.62 )
7 Photometric Redshifts and Systematic Variations in the SEDs of Luminous
  Red Galaxies from the SDSS DR7  (Corrcoef: 0.61 )
8 Recovering galaxy stellar population properties from broad-band spectral
  energy distribution fitting  (Corrcoef: 0.58 )
9 Estimating Spectra from Photometry  (Corrcoef: 0.57 )

Or using keywords:

sims = cn.list_similar_papers(model_data, doc_id = ['simulation','sed','fitting'], 
                           input_type='keywords', 
                           return_n=3, show_authors = True, show_summary=True)

which returns

Keyword(s):  ['simulation', 'sed', 'fitting']
multi-keyword
-----------------------------
Most similar/relevant papers: 
-----------------------------
0 Should we believe the results of UV-mm galaxy SED modelling?  (Corrcoef: 0.53 )
Authors:------
[{'name': 'Christopher C. Hayward'}, {'name': 'Daniel J. B. Smith'}]
Summary:------
We compare the properties inferred from the SED modelling with the true values and find that MAGPHYS recovers most physical parameters of the simulated galaxies well.
 
1 Morphology-assisted galaxy mass-to-light predictions using deep learning  (Corrcoef: 0.47 )
Authors:------
[{'name': 'Wouter Dobbels'}, {'name': 'Serge Krier'}, {'name': 'Stephan Pirson'}, {'name': 'Sébastien Viaene'}, {'name': 'Gert De Geyter'}, {'name': 'Samir Salim'}, {'name': 'Maarten Baes'}]
Summary:------
Spectral energy distribution (SED) fitting can make use of all available fluxes and their errors to make a Bayesian estimate of the M/L.
When we combine the morphology features with global g- and i-band luminosities, we find an improved estimate compared to a model which does not make use of morphology.
While our method was trained to reproduce global SED fitted M/L, galaxy morphology gives us an important additional constraint when using one or two bands.
 
2 Geometric and Dynamical Models of Reverberation Mapping Data  (Corrcoef: 0.46 )
Authors:------
[{'name': 'Anna Pancoast'}, {'name': 'Brendon J. Brewer'}, {'name': 'Tommaso Treu'}]
Summary:------
We present a general method to analyze reverberation mapping data that provides both estimates for the black hole mass and for the geometry and dynamics of the broad line region (BLR) in active galactic nuclei (AGN).

If you have any ideas for further applications for this, or ways it which we can extend the project, you can comment on the following Google Doc. If you are interested in contributing, please email me at kartheik.iyer@dunlap.utoronto.ca


This project was started during the hack session for .Astronomy 11 held at Toronto. See the full collection of present and past hacks at the hacks collector. Lots of thanks to Harshil Kamdar, Kate Storey-Fisher, Mubdi Rahman, Mariarosa Marinelli, Sarah Gossan and Bryan Gaensler for useful suggestions and data-wrangling help throughout the development process!

About

Associative clustering and analysis of papers on the ArXiv

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages