### In this notebook, we'll compare the clustering of atom-ph articles (clustering-atom-ph.ipynb) with DAMOP2016 abstracts

In [44]:
from collections import Counter
import json
from sklearn.externals import joblib

In [173]:
# First, load cluster predictor for atom-ph articles
clf = joblib.load('cluster-atom-ph.pkl') 

In [174]:
# Second, load articles from DAMOP
with open('../../damop data/damop2016.json') as f:
    damop = json.load(f)

In [175]:
exclude_list = ['Graduate Student Symposium',
                'DAMOP Prize Session',
                'DAMOP Thesis Prize Session',
               ]

In [185]:
sessions_all = 0
sessions_one_majority = 0
sessions_two_majority = 0

n_clusters = clf.get_params()['clf__n_clusters']
cluster_to_session = dict((x, []) for x in range(n_clusters))
sessions_unclassified = []

for session in damop:
    abstracts = map(lambda x: x['abstract'], session['abstracts'])
    if (len(abstracts) > 5) and (len(abstracts) < 40):
        y = clf.predict(abstracts)
        count = Counter(y)
        session_number_name = "{}: {}".format(session['number'], session['name'])
        print session_number_name
        sessions_all += 1

        if 1.*count.most_common(1)[0][1] >= 0.5*len(abstracts):
            print 'Majority cluster: {}'.format(count.most_common(1)[0][0])
            sessions_one_majority += 1
            
            cluster_to_session[count.most_common(1)[0][0]].append(session_number_name + ' (*)')
            
        elif 1.*(count.most_common(2)[0][1] + count.most_common(2)[1][1]) >= 0.5*len(abstracts):
            print 'Majority clusters: {}, {}'.format(count.most_common(2)[0][0], count.most_common(2)[1][0])
            sessions_two_majority += 1
            
            cluster_to_session[count.most_common(2)[0][0]].append(session_number_name)
            cluster_to_session[count.most_common(2)[1][0]].append(session_number_name)
            
        else:
            print y
            sessions_unclassified.append(session_number_name)
        print ''
        
#         if session['number'] == 'N6':
#             break

1A: Graduate Student Symposium
Majority cluster: 9

B3: Quantum Gases with Dipolar Interactions
Majority clusters: 17, 3

B4: Quantum Optics I
Majority cluster: 19

B5: Many-Body Localization and Disorder
Majority cluster: 5

B6: Progress in Spin-Orbit Coupling
Majority clusters: 12, 5

B7: Nonlinear Optics and Lasers
Majority cluster: 19

B9: Photoionization, Photodetachment and Photodissociation
Majority clusters: 8, 1

C4: Hybrid Quantum Systems
Majority clusters: 12, 19

C5: BEC with Strong Interactions
Majority cluster: 15

C6: Quantum Gas Microscope
Majority cluster: 5

C7: Atomic Clocks
Majority cluster: 4

C9: Strong-Field Physics in Atoms, Molecules, and Clusters
Majority cluster: 8

G4: Quantum Measurement
Majority cluster: 9

G5: Atomic Magnetometers I
Majority clusters: 12, 9

G6: One-Dimensional Gases and Nanofibers
Majority clusters: 19, 5

G7: Interaction Effects in Spin-Orbit Coupled Gases
Majority clusters: 9, 12

G8: Time-Resolved Electron Dynamics and Attosecond Spec

#### What fraction of the DAMOP sessions are covered by one or two clusters?

In [183]:
print (sessions_one_majority)*1./sessions_all
print (sessions_one_majority + sessions_two_majority)*1./sessions_all

0.440677966102
0.932203389831


#### Print DAMOP sessions that fall into each cluster.

In [184]:
order_centroids = clf.named_steps['clf'].cluster_centers_.argsort()[:, ::-1]

terms =  clf.named_steps['vect'].get_feature_names()

for cluster, val in cluster_to_session.iteritems():
    print "Cluster {}: {}".format(cluster, ', '.join([terms[x] for x in order_centroids[cluster, :10]]))
    for session in val:
        print '    {}'.format(session)
    print ''

Cluster 0: energy, functions, method, states, potential, density, function, wave, body, matrix
    P6: Cooling Methods and Interacting BEC's

Cluster 1: cross, cross sections, sections, cross section, section, electron, recombination, energy, photoionization, ionization
    B9: Photoionization, Photodetachment and Photodissociation
    J7: Effects of Collisions
    N8: Electronic, Atomic, and Molecular Collisions (*)

Cluster 2: cluster, coupled cluster, relativistic, coupled, calculations, relativistic coupled, relativistic coupled cluster, dipole, states, results

Cluster 3: molecules, field, trapping, ultracold, electric, polar, atoms, molecular, fields, magnetic
    B3: Quantum Gases with Dipolar Interactions
    H8: Molecular Control and Imaging
    M3: Focus: Cold and Ultracold Molecules (*)
    P8: Ultracold Bi-Alkalis

Cluster 4: frequency, laser, optical, clock, transition, cavity, nm, clocks, atomic, spectroscopy
    C7: Atomic Clocks (*)
    N5: Atom Interferometers

Cluster

Which sessions were not classified?

In [193]:
print 'Sessions without clusters'
for session in sessions_unclassified:
    print session
    print ''
print ''
print 'Clusters without sessions'
for cluster, session in cluster_to_session.iteritems():
    if len(session) == 0:
        print "Cluster {}: {}".format(cluster, ', '.join([terms[x] for x in order_centroids[cluster, :10]]))

Sessions without clusters
H6: Two-Dimensional Gases

N9: Cold and Ultracold Molecules II

P3: Interfacing Nanophotonics with Cold Atoms

T7: Spectroscopy, Lifetimes, Oscillator Strengths


Clusters without sessions
Cluster 2: cluster, coupled cluster, relativistic, coupled, calculations, relativistic coupled, relativistic coupled cluster, dipole, states, results
Cluster 7: alpha, hydrogen, hyperfine, proton, nuclear, structure, corrections, mu, fine, fine structure
Cluster 10: moments, electric, nuclear, edm, dipole, electric dipole, parity, dark, violating, dipole moments
Cluster 13: levels, strengths, lines, data, transitions, collision strengths, radiative, calculations, oscillator strengths, electron
