### In this notebook, we'll compare the clustering of atom-ph articles (clustering-atom-ph.ipynb) with DAMOP2016 sessions

In [3]:
from collections import Counter
import json
from sklearn.externals import joblib

In [4]:
# First, load cluster predictor for atom-ph articles
# clf = joblib.load('cluster-AMO-winner.pkl') 
clf = joblib.load('cluster-AMO-optics.pkl')
# clf = joblib.load('cluster-AMO.pkl')

In [5]:
# Second, load articles from DAMOP
with open('../../damop data/damop2016.json') as f:
    damop = json.load(f)

In [6]:
exclude_list = ['Graduate Student Symposium',
                'DAMOP Prize Session',
                'DAMOP Thesis Prize Session',
               ]

In [7]:
sessions_all = 0
sessions_one_majority = 0
sessions_two_majority = 0

n_clusters = clf.get_params()['clf__n_clusters']
cluster_to_session = dict((x, []) for x in range(n_clusters))
sessions_unclassified = []

for session in damop:
    abstracts = map(lambda x: x['abstract'], session['abstracts'])
    if (len(abstracts) > 4) and (len(abstracts) < 40):
        y = clf.predict(abstracts)
        count = Counter(y)
        session_number_name = "{}: {}".format(session['number'], session['name'])
        print session_number_name
        sessions_all += 1

        if 1.*count.most_common(1)[0][1] >= 0.5*len(abstracts):
            print 'Majority cluster: {}'.format(count.most_common(1)[0][0])
            sessions_one_majority += 1
            
            cluster_to_session[count.most_common(1)[0][0]].append(session_number_name + ' (*)')
            
        elif 1.*(count.most_common(2)[0][1] + count.most_common(2)[1][1]) >= 0.5*len(abstracts):
            print 'Majority clusters: {}, {}'.format(count.most_common(2)[0][0], count.most_common(2)[1][0])
            sessions_two_majority += 1
            
            cluster_to_session[count.most_common(2)[0][0]].append(session_number_name)
            cluster_to_session[count.most_common(2)[1][0]].append(session_number_name)
            
        else:
            print y
            sessions_unclassified.append(session_number_name)
        print ''
        
        if session['number'] == 'A1':
            break

1A: Graduate Student Symposium
Majority cluster: 13

B3: Quantum Gases with Dipolar Interactions
Majority clusters: 3, 4

B4: Quantum Optics I
Majority clusters: 0, 12

B5: Many-Body Localization and Disorder
Majority cluster: 4

B6: Progress in Spin-Orbit Coupling
Majority cluster: 8

B7: Nonlinear Optics and Lasers
[ 1 12 18  9  0  1 13  0 10 13]

B9: Photoionization, Photodetachment and Photodissociation
Majority cluster: 9

C4: Hybrid Quantum Systems
Majority clusters: 16, 5

C5: BEC with Strong Interactions
Majority cluster: 15

C6: Quantum Gas Microscope
Majority cluster: 3

C7: Atomic Clocks
Majority cluster: 7

C9: Strong-Field Physics in Atoms, Molecules, and Clusters
Majority cluster: 9

G4: Quantum Measurement
Majority clusters: 16, 13

G5: Atomic Magnetometers I
Majority clusters: 13, 5

G6: One-Dimensional Gases and Nanofibers
Majority clusters: 6, 11

G7: Interaction Effects in Spin-Orbit Coupled Gases
Majority clusters: 8, 3

G8: Time-Resolved Electron Dynamics and Attos

#### Print DAMOP sessions that fall into each cluster.

In [8]:
order_centroids = clf.named_steps['clf'].cluster_centers_.argsort()[:, ::-1]

terms =  clf.named_steps['vect'].get_feature_names()

for cluster, val in cluster_to_session.iteritems():
    print "Cluster {}: {}".format(cluster, ', '.join([terms[x] for x in order_centroids[cluster, :10]]))
    for session in val:
        print '    {}'.format(session)
    print ''

Cluster 0: nonlinear, beam, beams, optical, propagation, wave, light, waves, medium, polarization
    B4: Quantum Optics I

Cluster 1: momentum, angular, angular momentum, oam, orbital, orbital angular, orbital angular momentum, beam, beams, light

Cluster 2: frequency, laser, comb, noise, optical, fiber, nm, frequency comb, clock, phase

Cluster 3: lattice, phase, superfluid, mott, hubbard, model, optical lattice, quantum, hubbard model, phases
    B3: Quantum Gases with Dipolar Interactions
    C6: Quantum Gas Microscope (*)
    G7: Interaction Effects in Spin-Orbit Coupled Gases
    G9: Optical Lattices and Quantum Magnetism (*)
    M2: Many-Body Physics in Quantum Simulation
    N7: Long-range or Anisotropic Interactions in Cold Gases
    P5: Transport and Spatial Dynamics
    T8: Quench Dynamics in Degenerate Gases (*)
    U3: Long Range Interactions

Cluster 4: quantum, time, systems, energy, states, dynamics, model, state, non, theory
    B3: Quantum Gases with Dipolar Interacti

In [9]:
print 'Sessions without clusters'
for session in sessions_unclassified:
    print session
    print ''
print ''
print 'Clusters without sessions'
for cluster, session in cluster_to_session.iteritems():
    if len(session) == 0:
        print "Cluster {}: {}".format(cluster, ', '.join([terms[x] for x in order_centroids[cluster, :10]]))

Sessions without clusters
B7: Nonlinear Optics and Lasers

H7: Few-body Systems

N5: Atom Interferometers

P9: Quantum Control II

U5: Precision Experiments


Clusters without sessions
Cluster 1: momentum, angular, angular momentum, oam, orbital, orbital angular, orbital angular momentum, beam, beams, light
Cluster 2: frequency, laser, comb, noise, optical, fiber, nm, frequency comb, clock, phase
Cluster 17: solitons, soliton, dark, bright, nonlinear, stable, stability, nonlinearity, dark solitons, solutions
Cluster 18: photonic, modes, mode, crystal, photonic crystal, waveguide, pt, optical, band, light
Cluster 19: surface, dielectric, index, plasmon, metamaterial, optical, metamaterials, plasmonic, transmission, metal


#### What fraction of the DAMOP sessions are covered by one or two clusters?

In [10]:
print (sessions_one_majority)*1./sessions_all
print (sessions_one_majority + sessions_two_majority)*1./sessions_all
print sessions_all

0.542372881356
0.915254237288
59
