This file uses Longitudinal Graph Analyis (LGA) concepts. This was developed by Archive-It (https://support.archive-it.org/hc/en-us/articles/360039291992-Longitudinal-Graph-Analysis-LGA-files) and used in Gephi or image plot. Here it is used to sonify the ENMI data. 

In [4]:
import pandas as pd
from midiutil import MIDIFile
import math

In [5]:
#let's set up midi stuff
track    = 0
channel  = 0
time     = 0    # In beats
duration = 1    # In beats
tempo    = 60   # In BPM
volume   = 62  # 0-127, as per the MIDI standard

#num_tracks = 16
#MyMIDI = MIDIFile(num_tracks)  
#MyMIDI.addTempo(track, time, tempo)
#num_tracks = 4
#MyMIDI = MIDIFile(num_tracks)  
#MyMIDI.addTempo(track, time, tempo)

#MyMIDI.addNote(track, channel, frequency, time, duration, volume)

In [6]:
sessions = pd.read_csv('../sessions.csv')
enmi = pd.read_csv('../enmi.csv')
annotations = pd.read_csv('../annotations.csv')

In [None]:
sessions.head()

In [None]:
enmi.head()

Join the enmi and annotations data back together on the id from ENMI data. We may see some difference between the two data sources as the ENMI data is regenerated from Twitter to comply with the terms and conditions. Some data might be now deleted or made private and cannot be mined. 

In [7]:
annotation_data = pd.merge(enmi,annotations, how='left', on='id')
annotation_data.head()

Unnamed: 0.1,Unnamed: 0,created_at,id,id_str,text,truncated,source,in_reply_to_status_id,in_reply_to_status_id_str,in_reply_to_user_id,...,quoted_status.contributors,quoted_status.is_quote_status,quoted_status.retweet_count,quoted_status.favorite_count,quoted_status.favorited,quoted_status.retweeted,quoted_status.lang,ann,start,end
0,0,Wed Dec 19 13:30:12 +0000 2018,1075382796179386369,1075382796179386369,"@robywebo @OdileA Bonjour, je parlais du panel...",False,"<a href=""http://twitter.com/download/iphone"" r...",1.075374e+18,1.075374e+18,182802200.0,...,,,,,,,,,,
1,1,Wed Dec 19 13:33:53 +0000 2018,1075383722323689472,1075383722323689472,#enmi18 @vincentpuig ou se trouve ces outils? ...,False,"<a href=""https://polemictweet.com"" rel=""nofoll...",,,,...,,,,,,,,,675000.0,675000.0
2,2,Wed Dec 19 13:34:23 +0000 2018,1075383846185635840,1075383846185635840,#enmi18 Paolo Vignola et Sara Baranzoni : inve...,False,"<a href=""https://polemictweet.com"" rel=""nofoll...",,,,...,,,,,,,,REF,705000.0,705000.0
3,3,Wed Dec 19 13:40:59 +0000 2018,1075385508635783170,1075385508635783170,Dernier session des #enmi18 : Le terrain du te...,True,"<a href=""https://about.twitter.com/products/tw...",,,,...,,,,,,,,,,
4,4,Wed Dec 19 13:45:22 +0000 2018,1075386610961784837,1075386610961784837,@Isabell42560134 #enmi18 Suivez les liens vers...,False,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",1.075384e+18,1.075384e+18,1.070035e+18,...,,,,,,,,,,


In [8]:
def find_start_end(session):
    notes = annotation_data[(annotation_data['start'] >= session[0])&(annotation_data['end'] < session[1])]
    return notes[['text', 'user.screen_name', 'source']]

List comprehension to identify the tweets per session. The session is identified by the times in the metadata where start and end are different. If it is the same, then it is a tweet. 

The sampling here is based on the sessions in the recording. The sample size perhaps alters the data that we get and the model. Sampling allows us to engage with time-axis media by extracting the sample size from the data to allow it to be processed. 

In [9]:
tweets = [find_start_end(session) for session in sessions.to_numpy()]

## User Network

Creating an ongoing network for each user in a session. 

In [50]:
from collections import Counter
#import networkx as nx
#from networkx.readwrite import json_graph
import json

num_tracks = 4

#G = nx.Graph()

sess = 0

for t in tweets:
    c = {}
    lgaMIDI = MIDIFile(num_tracks)  
    lgaMIDI.addTempo(track, time, tempo)
    sess += 1
    
    for tweet in t.to_numpy():
        #c = Counter(tweet['user.screen_name'])
        # build a list
        _text = tweet[0]
        
        if tweet[1] in c:
            c[tweet[1]]['value'] += 1
        else:
            c[tweet[1]] = {'value': 1, 'OK':0, 'KO':0,'Q':0, 'REF':0}
        
        if "++" in _text: 
            c[tweet[1]]['OK']+=1

        if "??" in _text: 
            c[tweet[1]]['KO']+=1

        if "**" in _text: 
            c[tweet[1]]['Q']+=1

        if "==" in _text:
            c[tweet[1]]['REF']+=1        
        #useful for visualisation. Just need the source, target and weights for sonification. 

        #G.add_node(sess)
        #for user in tweet['user.screen_name']:
        #    G.add_node(user)
        #    G.add_edge(sess, user)
        #    G[sess][user]['weight']=c[user]

    t=0
    #create the graph now.
    for user, weight in c.items():
        t  += 1
        #node / nodedge for base edge
        lgaMIDI.addNote(track, channel, 40 + sess, t, 0.6, volume)   
        lgaMIDI.addNote(track, channel, 42, t + 0.4, 0.6, volume)
        # Polemic Parts
        #OK
        if c[user]['OK']  > 0: lgaMIDI.addNote(track, channel, 32, t, c[user]['OK'], volume)
        #KO
        if c[user]['KO']  > 0: lgaMIDI.addNote(track, channel, 27, t, c[user]['KO'], volume)
        #Q
        if c[user]['Q']   > 0: lgaMIDI.addNote(track, channel, 24, t, c[user]['Q'], volume)
        #REF
        if c[user]['REF'] > 0: lgaMIDI.addNote(track, channel, 20, t, c[user]['REF'], volume)

    with open(str(sess)+"_lga.mid", "wb") as f:
        lgaMIDI.writeFile(f)
    
#json.dumps(json_graph.node_link_data(G, {'link': 'edges', 'source': 'from', 'target': 'to', 'weight': 'weight'}))

session: 1
session: 2
session: 3
session: 4
session: 5
session: 6
session: 7
session: 8


Create a sonification for the whole session. 

In [None]:
from collections import Counter
import json

num_tracks = 4


sess = 0
c = {}
for t in tweets:
    lgaMIDI = MIDIFile(num_tracks)  
    lgaMIDI.addTempo(track, time, tempo)
    
    for tweet in t.to_numpy():
        # build a list
        _text = tweet[0]
        
        if tweet[1] in c:
            c[tweet[1]]['value'] += 1
        else:
            c[tweet[1]] = {'value': 1, 'OK':0, 'KO':0,'Q':0, 'REF':0}
        
        if "++" in _text: 
            c[tweet[1]]['OK']+=1

        if "??" in _text: 
            c[tweet[1]]['KO']+=1

        if "**" in _text: 
            c[tweet[1]]['Q']+=1

        if "==" in _text:
            c[tweet[1]]['REF']+=1        

t=0
#create the graph now.
for user, weight in c.items():
    t  += 1
    #node / nodedge for base edge
    lgaMIDI.addNote(track, channel, 40 + sess, t, 0.6, volume)   
    lgaMIDI.addNote(track, channel, 42, t + 0.4, 0.6, volume)
    # Polemic Parts
    #OK
    if c[user]['OK']  > 0: lgaMIDI.addNote(track, channel, 32, t, c[user]['OK'], volume)
    #KO
    if c[user]['KO']  > 0: lgaMIDI.addNote(track, channel, 27, t, c[user]['KO'], volume)
    #Q
    if c[user]['Q']   > 0: lgaMIDI.addNote(track, channel, 24, t, c[user]['Q'], volume)
    #REF
    if c[user]['REF'] > 0: lgaMIDI.addNote(track, channel, 20, t, c[user]['REF'], volume)

with open("lga.mid", "wb") as f:
    lgaMIDI.writeFile(f)

## Polemic Annotations

Extracting the Polemic annotations from the sessions to use as a soundscape or drone? It is already in the timeline. It is a different form of the using the network to write and it enables the authors to think, although it is a controlled way of creating a memory. 

In [None]:
from collections import defaultdict

polemic_notes = {'OK': 20, 'KO': 23, 'Q': 26, 'REF': 30}

sess = 0
for tweet in tweets:
    polemMIDI = MIDIFile(num_tracks)  
    polemMIDI.addTempo(track, time, tempo)    
    sess += 1
    midtime = 0
    polemic = defaultdict(int)
    for t in tweet.to_numpy():
        midtime += 1
        _text = str(t[0])

        if "++" in _text: 
            polemic['OK']+=1
            polemMIDI.addNote(track, channel, polemic_notes['OK'], midtime, 0.3, volume)
            polemMIDI.addNote(track, channel, polemic_notes['OK']+5, midtime+0.4, 0.3, volume)

        if "??" in _text: 
            polemic['KO']+=1
            polemMIDI.addNote(track, channel, polemic_notes['KO'], midtime, 0.1, volume)
            polemMIDI.addNote(track, channel, polemic_notes['KO'], midtime+0.2, 0.1, volume)
            polemMIDI.addNote(track, channel, polemic_notes['KO'], midtime+0.4, 0.1, volume)

        if "**" in _text: 
            polemic['Q']+=1
            polemMIDI.addNote(track, channel, polemic_notes['Q'], midtime, 0.2, volume)
            polemMIDI.addNote(track, channel, polemic_notes['Q'] + 3, midtime, 0.2, volume)

        if "==" in _text:
            polemic['REF']+=1
            polemMIDI.addNote(track, channel, polemic_notes['REF'], midtime, 0.2, volume)

    with open(str(sess) + "_polem.mid", "wb") as f:
        polemMIDI.writeFile(f)

Todo: Write the base algorithm to map to time.

foreach time:
    time_difference = current - previous


If the time_difference is nearer, then go louder or make it quiter. Says the same if the equal. 

## Find the Techniques of Writing

Extracting the URLs, usernames and hashtags. This is looking at the technical techniques of writing made possible by the computational, particularly Twitter. The use of @usernames or hashtags create different flows of writing through the computational platforms. 

In [None]:
import re

num_tracks = 4

t=0
sess = 0
for tweet in tweets:
    writingMIDI = MIDIFile(num_tracks)  
    writingMIDI.addTempo(track, time, tempo)
    sess += 1

    midtime = 0
    for t in tweet.to_numpy():
        search = t[0]
        midtime += 1

        match = re.findall(r'(http|ftp|https):\/\/([\w\-_]+(?:(?:\.[\w\-_]+)+))([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?', search)
        if match:
            for m in match:
                print(sess)
                print(''.join(m[1:]))
                #very much to do but link url to a type
                if m[1] == "t.co":
                    writingMIDI.addNote(track, channel, 10, midtime, 0.5, volume)
                    writingMIDI.addNote(track, channel, 20, midtime + 0.25, 0.5, volume)
                else:
                    writingMIDI.addNote(track, channel, 20, midtime, 1, volume)

        match = re.findall(r"#(\w+)", search)
        if match:
            #print(match) 
            for m in match:
                writingMIDI.addNote(track, channel,30, midtime, 1, volume)

        match = re.findall(r"@(\w+)", search)
        if match:
            #print(match)
            for m in match:
                writingMIDI.addNote(track, channel,40, midtime, 1, volume)

        #match = re.findall(r"RT", search)
        #if match:
            #print(match)
        #    for m in match:
        #        writingMIDI.addNote(track, channel,10, midtime, 2, volume)
                
    #create a file per session
    with open(str(sess) + "_writing.mid", "wb") as f:
        writingMIDI.writeFile(f)
    t=0

Reading the sources that enable the writing of the tweet. 

This might be useful for a background sound as the method of writing. Polemic Tweet supports the writing of the annotations and uses the web. We can see iphones as well. 

In [None]:
def cleanhtml(raw_html):
    cleanr = re.compile('<.*?>')
    cleantext = re.sub(cleanr, '', raw_html)
    return cleantext

sess = 0
for tweet in tweets:
    sess +=1 
    print(sess)
    sources  = Counter(tweet['source'])
    for k,v in sources.items():
        print("{} {}".format(cleanhtml(k), v))