### In this notebook, we'll compare the clustering of atom-ph articles (clustering-atom-ph.ipynb) with DAMOP2016 sessions

In [131]:
from collections import Counter
import json
from sklearn.externals import joblib

In [132]:
# First, load cluster predictor for atom-ph articles
# clf = joblib.load('cluster-AMO-winner.pkl') 
# clf = joblib.load('cluster-AMO-optics.pkl')
clf = joblib.load('cluster-AMO.pkl')

In [133]:
# Second, load articles from DAMOP
with open('../../damop data/damop2016.json') as f:
    damop = json.load(f)

In [134]:
exclude_list = ['Graduate Student Symposium',
                'DAMOP Prize Session',
                'DAMOP Thesis Prize Session',
               ]

In [135]:
sessions_all = 0
sessions_one_majority = 0
sessions_two_majority = 0

n_clusters = clf.get_params()['clf__n_clusters']
cluster_to_session = dict((x, []) for x in range(n_clusters))
sessions_unclassified = []

for session in damop:
    abstracts = map(lambda x: x['abstract'], session['abstracts'])
    if (len(abstracts) > 4) and (len(abstracts) < 40):
        y = clf.predict(abstracts)
        count = Counter(y)
        session_number_name = "{}: {}".format(session['number'], session['name'])
        print session_number_name
        sessions_all += 1

        if 1.*count.most_common(1)[0][1] >= 0.5*len(abstracts):
            print 'Majority cluster: {}'.format(count.most_common(1)[0][0])
            sessions_one_majority += 1
            
            cluster_to_session[count.most_common(1)[0][0]].append(session_number_name + ' (*)')
            
        elif 1.*(count.most_common(2)[0][1] + count.most_common(2)[1][1]) >= 0.5*len(abstracts):
            print 'Majority clusters: {}, {}'.format(count.most_common(2)[0][0], count.most_common(2)[1][0])
            sessions_two_majority += 1
            
            cluster_to_session[count.most_common(2)[0][0]].append(session_number_name)
            cluster_to_session[count.most_common(2)[1][0]].append(session_number_name)
            
        else:
            print y
            sessions_unclassified.append(session_number_name)
        print ''
        
        if session['number'] == 'A1':
            break

1A: Graduate Student Symposium
Majority cluster: 9

B3: Quantum Gases with Dipolar Interactions
Majority clusters: 1, 10

B4: Quantum Optics I
Majority cluster: 9

B5: Many-Body Localization and Disorder
Majority clusters: 5, 9

B6: Progress in Spin-Orbit Coupling
Majority clusters: 2, 5

B7: Nonlinear Optics and Lasers
Majority cluster: 9

B9: Photoionization, Photodetachment and Photodissociation
Majority clusters: 16, 8

C4: Hybrid Quantum Systems
Majority clusters: 15, 24

C5: BEC with Strong Interactions
Majority clusters: 3, 0

C6: Quantum Gas Microscope
[ 5 14  9  5  9 19 15 22 14 22]

C7: Atomic Clocks
Majority cluster: 23

C9: Strong-Field Physics in Atoms, Molecules, and Clusters
Majority cluster: 16

G4: Quantum Measurement
Majority clusters: 9, 5

G5: Atomic Magnetometers I
Majority cluster: 9

G6: One-Dimensional Gases and Nanofibers
Majority clusters: 9, 11

G7: Interaction Effects in Spin-Orbit Coupled Gases
Majority clusters: 2, 9

G8: Time-Resolved Electron Dynamics an

#### Print DAMOP sessions that fall into each cluster.

In [136]:
order_centroids = clf.named_steps['clf'].cluster_centers_.argsort()[:, ::-1]

terms =  clf.named_steps['vect'].get_feature_names()

for cluster, val in cluster_to_session.iteritems():
    print "Cluster {}: {}".format(cluster, ', '.join([terms[x] for x in order_centroids[cluster, :10]]))
    for session in val:
        print '    {}'.format(session)
    print ''

Cluster 0: condensate, bose einstein condensate, einstein condensate, einstein, bose einstein, bose, bec, potential, einstein condensate bec, condensate bec
    C5: BEC with Strong Interactions
    P5: Transport and Spatial Dynamics
    T8: Quench Dynamics in Degenerate Gases

Cluster 1: resonances, molecules, feshbach, resonance, magnetic, ultracold, scattering, state, feshbach resonances, field
    B3: Quantum Gases with Dipolar Interactions
    H8: Molecular Control and Imaging
    M3: Focus: Cold and Ultracold Molecules
    N9: Cold and Ultracold Molecules II
    P8: Ultracold Bi-Alkalis (*)
    U6: Photoassociation and Collisons, Optical Feshbach Resonances

Cluster 2: spin orbit, orbit, spin, orbit coupling, spin orbit coupling, coupling, soc, rashba, topological, phase
    B6: Progress in Spin-Orbit Coupling
    G7: Interaction Effects in Spin-Orbit Coupled Gases

Cluster 3: scattering, body, scattering length, length, efimov, universal, dimer, range, bound, energy
    C5: BEC w

In [137]:
print 'Sessions without clusters'
for session in sessions_unclassified:
    print session
    print ''
print ''
print 'Clusters without sessions'
for cluster, session in cluster_to_session.iteritems():
    if len(session) == 0:
        print "Cluster {}: {}".format(cluster, ', '.join([terms[x] for x in order_centroids[cluster, :10]]))

Sessions without clusters
C6: Quantum Gas Microscope

H6: Two-Dimensional Gases

H7: Few-body Systems

J5: Precision Measurements

J8: Impurities in Quantum Gases

M9: Novel Phases and Ordering in Fermi Gases

N7: Long-range or Anisotropic Interactions in Cold Gases

U3: Long Range Interactions


Clusters without sessions
Cluster 6: impurity, polaron, gas, mass, single impurity, interaction, impurity atom, impurities, fermi, impurity atoms
Cluster 13: functions, wave, states, matrix, method, energy, function, wave functions, potential, electron
Cluster 17: phase, phases, phase diagram, diagram, superfluid, lattice, quantum, density, model, transition
Cluster 18: alpha, proton, hyperfine, structure, fine, fine structure, mu, muonic, variation, hydrogen
Cluster 20: solitons, soliton, dark, bright, dark solitons, solutions, nonlinear, dark soliton, bose einstein, einstein


#### What fraction of the DAMOP sessions are covered by one or two clusters?

In [138]:
print (sessions_one_majority)*1./sessions_all
print (sessions_one_majority + sessions_two_majority)*1./sessions_all
print sessions_all

0.372881355932
0.864406779661
59
