# Feminist commemoration of the 1916 Easter Rising and the ethics of Twitter as data

## Author 1

Author. 

## Author 2

Author, developer.

Ireland, 1916 Easter Rising, Twitter, feminism, commemoration, ethics

The centenary of the 1916 Easter Rising was celebrated in 2016, the centrepiece of a decade-long programme of national commemoration in the Republic of Ireland. Marked by digitality and widely declared a success for public engagement with history, the centenary also represented a turning point in feminist re-appraisal of the Irish revolution, one that has been reflected in wider societal shifts concerning the position and freedoms of Irish women today. Drawing upon the primary author's doctoral research, which investigated the nature of feminist commemoration through Twitter in 2016, this article demonstrates the methods and collaboration involved in collecting 'historical' tweets from the 1916 centenary via the Twitter Premium API as well as the ethical considerations and methodological challenges of such research.

# Introduction

Social media platforms are not only spaces in which digital heritage is shared or consumed as participatory arms of cultural entities, but in which remembering takes place, in which commemoration is mobilised such as for feminist historical activism. During the Irish centenary commemorations in 2016, the use of social media as a stage for public debate and a spotlight on women’s underrepresentation in the Easter Rising was observed <cite data-cite="4766306/8NUABFVV"></cite>. The 1916 Easter Rising was a rebellion against British colonial rule in Ireland, and it set in motion a renewed military and political campaign for independence, culminating in the Anglo-Irish Treaty of 1921, and the partition of six northern counties. The centenary of this pivotal event was part of a wider programme of national remembrance, the ‘Decade of Centenaries,’ in the Republic of Ireland and Northern Ireland. Social media is, as Clavert suggests, a tool for the mobilisation and (re)appropriation of commemoration <cite data-cite="4766306/2E3VISE7"></cite> and as a particularly ‘reactive medium’ <cite data-cite="4766306/GV83697G"></cite> Twitter lends itself to moments of national remembrance. This type of Twitter activity in the Irish commemorations peaked in 2016 during which a feminist discourse of remembrance was carved out online, reflecting a renewed and expanding historical consciousness of the women who shaped the course of Irish history. This shift followed decades of research and was further spurred on by the rise of public history, the opening and digitising of certain archives, and the affective economy of a national period of commemoration. The commemorations were also inflected by contemporary gender politics in the Republic. Many gendered grievances against the State in recent years have been mirrored in the demand for a more critical and representative historical narrative and a re-evaluation of whose heritage is valued in national commemoration. Twitter is a space in which commemoration, a relationship to the past and by extension collective memory and identities, may be performed, reified and challenged with fleeting intensity. It is an extension of both official and unofficial commemorations that amplify and interlock with online engagement around historically significant moments, and as such is a snapshot of the ways in which publics are critically engaging with the past in the present.

In [1]:
from IPython.display import Image 
metadata={
    "jdh": {
        "module": "object",
        "object": {
            "type":"image",
            "source": [
                "figure 1: ‘Easter Rising 2016’ dataset, retweets removed (c. 139k original Tweets) 1 Jan - 31 May 2016"
            ]
        }
    }
}
Image(url= "media/1916-2016_Jan-May-tweets-plot.jpg", metadata=metadata)

These observations prompted a major aspect of the primary author's doctoral research: feminist commemoration on Twitter during the centenary of the Easter Rising (ref-thesis). As a researcher with a background in Public History and little prior knowledge of programming, this presented a daunting task for which, despite an abundance of Twitter studies, little could be found in the way of clarity and instruction for novices beyond computer science sources, perhaps especially so as it concerned historical, that is to say retrospectively collected, Tweets. Elsewhere discussed by the author is the archival value of Twitter - Twitter as cultural heritage – and how public historians and critical heritage studies scholars are well-placed to interpret this memory and history-making space, particularly with the partial liberalisation of the Twitter API (ref-forthcoming 2022). Methodological development and collaboration are needed to make this a wider reality. This article seeks to contribute to greater documentation of methods and methodology for such digital history and heritage work. This article will therefore demonstrate the methods and collaboration involved in collecting historical tweets from the 2016 commemorations via the Twitter API for the analysis informed by critical heritage studies and public history. Digital humanities methods were applied to the study of Irish national commemorations and in this context to interrogate Twitter as it is embedded in the mediation of collective memory and identity in late-modern authorised commemoration. In their case study of the Fukushima disaster anniversary, Rantasila et al. pointed to the need for greater integration of qualitative approaches in network analyses, big-data, and social media research <cite data-cite="4766306/8CHKQ2KQ"></cite> and this study attends to this in the application of thematic qualitative analysis to Tweets collected. 

Twitter and remembrance have tended to focus on the commemoration and memorialisation of recent traumatic events and less on nationalising, historical commemorations, Clavert’s work on the commemoration of the First World War in France between 2014 and 2019 being a notable exception. This longitudinal study is perhaps unique in collecting a dataset of several million tweets based on a national and international period of commemoration, and in its expansiveness, this study also captured a small number of tweets from the Irish celebrations in 2016. Primarily pertaining to the French commemorations, Clavert’s study explicitly sets out to interrogate the relationship that may exist between commemorations, collective memory, and social networks - specifically Twitter <cite data-cite="4766306/2E3VISE7"></cite>. The Irish commemorations have left - and continue to leave - a substantial digital trace, a *record* of activity in the evidentiary sense but also in the social media sense of user-generated content <cite data-cite="4766306/EJVNCVWR"></cite>. Twitter is therefore both a data source and an historical source, one that has archival value if not archivable in the traditional sense. 

Social media, in this way, is both a litmus test for major events or upheavals and a birds-eye view of popular attitudes that coalesce around them <cite data-cite="4766306/ZEZX8GEI"></cite>. However, this tendency towards topic-based research is also a practical issue, as Boyd and Crawford remind, due to prior restrictions on historical data collection that Twitter imposed: at the time of researching and writing my doctoral thesis, upon which this article draws, it was not possible to collect Twitter data more than seven days in the past without recourse to a data purchase <cite data-cite="4766306/CQMSBDX8"></cite>. Recent changes to policy have heralded free and enhanced access to the full historical archive of Tweets for academic researchers. Pre-defined datasets are also being compiled and released by Twitter for research around global issues like the coronavirus pandemic <cite data-cite="4766306/XSRSN7R6"></cite>. The historical but also archival value of Twitter has long since been recognised in the fraught attempt by the Library of Congress to process the deposit of Twitter’s archive of public tweets since 2006, part of a broader drive to archive the web <cite data-cite="4766306/PAE5MSB5"></cite>. Web archives, inclusive of social media feeds, are increasingly understood as having historical value both for contemporary and future history, and therefore as digital heritage collected and preserved in institutional repositories and national web archive collections. Collecting but also archiving Tweet datasets based on keywords or hashtags (as opposed to user feeds) is also increasingly happening outside of formal memory institutions for both research and activism e.g., DocNow <cite data-cite="4766306/CBLKK9VX"></cite>.

Twitter data has, for many years now, been commodified through reselling services as well as restricting free access to Tweet data using its Standard API to the past seven days only. And while the Cambridge Analytica scandal concerned Facebook and its subsidiaries, Twitter also responded by altering its API management process; as of July 2018, signing up for a Twitter Developers account for API access requires an application and authentication process, and agreement to terms of use that restrict the kinds of research that can be carried out <cite data-cite="4766306/A5QUR3PZ"></cite>. Through the Developers service, Twitter offers a subscription-based, three-tier paid access model, the upper ends of which are, for most researchers, prohibitively expensive. Generous docotral funding allowed for such access (carried out in November 2019), a privileged position within this data-selling landscape, its uneven API access and implications for research, that must be acknowledged <cite data-cite="4766306/DJSMJU9J"></cite> <cite data-cite="4766306/A5QUR3PZ"></cite>. Equally, the ethics of Twitter as data are situated with the researcher. 

# Ethic and Privacy

Infrastructural limitations on social media research influence the kind of research that can be conducted, and the kinds of questions asked: ‘The underlying features of social platforms impinge on research designs and data collection, as one cannot ask questions of data that is not possible to collect’ <cite data-cite="4766306/DJSMJU9J"></cite>. Closely related is the ‘ethics turn in social media,’ <cite data-cite="4766306/FRKCN4MR"></cite> as well as the tightening of general data protection regulations (GDPR) in the EU, which, combined with Twitter's policies, impacts the ways that researchers can report their findings.  The work of Ahmed et al is particularly instructive in navigating this terrain. They provide an overview of the main privacy challenges and ethical grey areas in social media research with a focus on Twitter, and is grounded in the approach that ‘traditional ethical principles such as consent, anonymity, and avoiding undue harm should also be applied to social media research’ <cite data-cite="4766306/5X4D39SC"></cite>. Twitter has always maintained that any Tweet is public information ‘by default’ <cite data-cite="4766306/ZS3Q6GP2"></cite> unless otherwise restricted through the user privacy settings. Only public tweets - those not protected by these settings - can therefore be collected through the Twitter API. Policy has since been updated to reflect recent currents in social media data re-use and privacy, re-affirming the public nature of Tweets but also that the responsibility for public Tweets and how they may be used elsewhere, lies with the user: ‘You are responsible for your Tweets and other information you provide through our services, and you should think carefully about what you make public, especially if it is sensitive information’ <cite data-cite="4766306/ZS3Q6GP2"></cite>. Elsewhere, individual choice and the burden of liability are similarly built into statements about the function of its APIs, where Twitter data is described as ‘unique from data shared by most other social platforms because it reflects information that users choose to share publicly’ <cite data-cite="4766306/5RJIC6DB"></cite>. Such nuances are indicative of a hands-off approach by social media companies that operate open and increasingly scrutinized communication environments, as well as tying with Myers and Hamilton’s assertion that ‘the form of Twitter similarly [to Facebook] embodies classical liberalism by also constituting…the user as an autonomous individual’ <cite data-cite="4766306/ZEZX8GEI"></cite>. 

Advocating for critical practice in the use of cultural data, Earhart contends that ‘Central to ethical engagement with large datasets that contain individual identifiers, such as is the case with tweets, is careful consideration of the positionality of the researcher and the development of a methodology that protects the privacy of individuals’ <cite data-cite="4766306/5EG6DA6S"></cite>. And if the ‘process of evaluating the research ethics cannot be ignored simply because the data are seemingly public’<cite data-cite="4766306/CQMSBDX8"></cite>, as researchers we cannot operate on the assumption that agreeing to the terms of service of Twitter is a proxy for consenting to be part of a university research project. Though Twitter is by all accounts considered public domain data, it behoves us to adhere to higher ethical standards than the legal technicalities afforded by terms of service <cite data-cite="4766306/5X4D39SC"></cite> that seek to limit corporate liability on the part of companies who profit from user generated data in what is always an asymmetric agreement. ‘Sensitivity’ of the data will be case dependent, and we should bear in mind not alone the topic of study (in this case commemoration), but the community of study, power dynamics, and our positionality in this equation, and the ways in which we then interpret and narrativize the data <cite data-cite="4766306/5EG6DA6S"></cite>.

‘Participants’ are considered to have given general consent upon agreeing to the terms of service, however, it remains unfeasible to obtain specific informed consent from thousands of users that appear in a set of public domain tweets collected by hashtag or keyword research. We can nonetheless respect user privacy and data confidentiality in the ways we conduct and present research, beginning with formal approval of a project by the host institution’s ethical oversight, a process that demands significant reflection on the potential consequences of our work (Note: Ethics approval was granted for the conduct of this data collection and analysis by the UCL Research Ethics Committee). Any analysis and presentation of data must be aggregated, without the use of direct quotes or publishing of usernames (with some exceptions) unless with informed consent, a mechanism which should be built into the ethics application and research planning rather than *post hoc*. As tweet datasets stored offline must be updated to reflect any subsequent account or Tweet deletions, such consent provisions further future-proof publication of the findings that use direct quotes. In this study, any direct references to tweets use only keywords, are significantly reworded or paraphrased so as not to be reverse searchable. An exception was made for accounts operated by publicly-funded entities such as national institutions, where they are of interest, and ‘accounts of public interest’ <cite data-cite="4766306/D6I3Z7NC"></cite> that have verified (‘blue tick’) status and limiting these to organisational rather than private indivuals'accounts, e.g. the account of the official commemorations body @ireland2016. As the topic of research did not fall under any of the categories of highly sensitive information, potential risk to participants through de-identification, reverse-searching of text or a data storage failure was low <cite data-cite="4766306/W5XS2QWQ"></cite>. Data was collected, stored and analysed using an encrypted laptop, and stored securely using the UCL N:Drive research server. Qualitative thematic coding to analyse and report on the data was also drawn from Ahmed’s use of the method for researching public health and global pandemics <cite data-cite="4766306/W5XS2QWQ"></cite> <cite data-cite="4766306/FQ2JB35C"></cite>. 

Closely related to navigating ethics and privacy are considerations of copyright vis à vis contemporary and social media data. In the past, Twitter largely prohibited the sharing of full datasets except in the form of Tweet IDs with limits to how many could be shared per 30-day period, and only for non-profit academic research, meaning that even in this form there were technical and legal barriers to accessing datasets and evaluating research <cite data-cite="4766306/HWQ9TSU7"></cite>. Datasets may now be shared with peer-reviewers in the interest of research integrity and transparency, with upper limits on Tweet ID sharing increased <cite data-cite="4766306/QS4B4NIV"></cite>. Nevertheless, the interaction of GDPR, copyright restrictions, and the implementation of a rigorously ethical methodology, creates difficulties for reporting on the data in a meaningful way and given that evidence is the *sine qua non* of research integrity. As one hate-speech researcher had to contend with, the difficulty of obtaining informed consent to publish verbatim hateful tweet content (considered 'high-risk') and, effectively, the protection of hate-speech producers over the interests of those who are subjected to it, had profound consequences for the research focus and for methodological integrity, critical inquiry, and justice (i.e., that 'participation in, and gains from, research should be as equitable as possible') <cite data-cite="4766306/7X8C55RI"></cite>. For the digital historian, it also creates challenges for presenting a compelling narrative rather than quantitative descriptions, which is a methodological and epistemological problem as much as a regulatory one.

# Data Collection

> Data access is the first in a number of steps researchers have to take as they collect, process, validate, interpret, share, and archive the data. These steps often require robust technical skills, as API endpoints for data collection were designed for programmers building application software that adds to the services offered by social platforms <cite data-cite="4766306/DJSMJU9J"></cite>.  

The data collection was carried out by utilising Twitter’s ‘Premium’ Application Programming Interfaces (API). Using Python, the Premium API was leveraged to retrieve historical Tweets from 2016 for analysis. At the time, free access to Tweets was restricted to seven days in the past from the point of retrieval. As well as offering access to historical Tweets and enhanced metadata, paid Twitter APIs also afford complete access to matching Tweets whereas the free API returns an incomplete sample <cite data-cite="4766306/MM75RPG6"></cite>. Twitter has provided its own data-selling service called ‘Twitter Developers’ since 2017, which is subscription-based with three tiers of access to its APIs - Standard, Premium and Enterprise - Developers. ‘Academic Research’ was also already identified by Twitter as significant use case within this framework, prior to the recent liberalisation. The Premium API provides access to the full archive of Tweets since 2006, that is to say ‘Filtered access to the entire public history of Tweets through Boolean queries’ <cite data-cite="4766306/J7K5MA3S"></cite>.  

The creation of a programme for retrieving data through the Twitter API was a collaborative effort between the authors. Scholars of public history and heritage studies should be well placed to interrogate the many uses of the past in social media spaces. However, for those wishing to conduct social media research that falls outside the standard limitations of free APIs and the affordances of out-of-the-box tools, or who do not possess the skills required to utilise APIs in a timely fashion, collaboration with scholars in computer and data science is increasingly necessary and common: ‘collaboration is a normal practice of humanities computing and should therefore be imagined as part of any discussion of method’ <cite data-cite="4766306/BX8IMUQP"></cite>. Rockwell and Sinclair describe a process of ‘pair work’ that reflects the time we spent in trial and error - Author 2 at the keyboard, Author 1 reviewing and reflecting - discussing, testing, wading through documentation and ‘thinking through the code’ <cite data-cite="4766306/BX8IMUQP"></cite>. This was a constant dialogue about how to retrieve the desired data, what form it would take, what limitations we faced or what limitations to place upon the amount and types of data collected (extending into considerations of ethics and privacy), and what we wanted the data to look like after cleaning and processing in order to start making sense of it, and possibilities for analysis. In this way, we produced a programme to retrieve historical tweets through Twitter Developers, and supplementary data cleaning and processing programmes.

As a reference for designing and modifying the eventual query, the below tweet metadata template was retrieved using the Standard API (anonymised) provides ‘deep JSON’ (JavaScript Object Notation) or nested information (i.e., multi-level, Russian doll-like), meaning it also shows the elements of the user metadata that may be of interest:

In [None]:
[
{
'created_at': 'Fri Jan 1 00:00:00 +0000 2021',
'id': xxxxxxxxxxxxxxxxx,
'id_str': 'xxxxxxxxxxxxxxxxx',
'full_text': 'This is the full metadata of a Tweet’.,
'truncated': False,
'display_text_range': [0, 280],
'entities': {
	'hashtags': [],
	'symbols': [],
	'user_mentions': [],
	'urls': []
},
'metadata': {
	'result_type': 'popular',
	'iso_language_code': 'en'
},
'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>',
'in_reply_to_status_id': None,
'in_reply_to_status_id_str': None,
'in_reply_to_user_id': None,
'in_reply_to_user_id_str': None,
'in_reply_to_screen_name': None,
'user': {
	'id': xxxxxxxx,
	'id_str': 'xxxxxxxx',
	'name': 'Tweet M. Data',
	'screen_name': 'T_metadata',
	'location': 'Twittersphere',
	'description': 'I Tweet, therefore I am’.,
	'url': 'https://t.co/xxxxxxxxxx',
	'entities': {
		'url': {
			'urls': [{
				'url': 'https://t.co/xxxxxxxxxx',
				'expanded_url': 'http://www.tweetmetadata.com',
				'display_url': 'tweetmetadata.com',
				'indices': [0, 10]
			}]
		},
		'description': {
			'urls': []
		}
	},
	'protected': False,
	'followers_count': 0,
	'friends_count': 0,
	'listed_count': 1,
	'created_at': 'Mon Jan 01 00:00:00 +0000 2006',
	'favourites_count': 0,
	'utc_offset': None,
	'time_zone': None,
	'geo_enabled': False,
	'verified': True,
	'statuses_count': 0,
	'lang': 'en',
	'contributors_enabled': False,
	'is_translator': False,
	'is_translation_enabled': False,
	'profile_background_color': 'XXXXXX',
	'profile_background_image_url':,
	'profile_background_image_url_https':,
	'profile_background_tile': True,
	'profile_image_url': None,
	'profile_image_url_https': None ,
	'profile_banner_url': 
	'profile_link_color': 'XXXXXX',
	'profile_sidebar_border_color': '000000',
	'profile_sidebar_fill_color': '000000',
	'profile_text_color': '000000',
	'profile_use_background_image': False,
	'has_extended_profile': False,
	'default_profile': True,
	'default_profile_image': True,
	'following': None,
	'follow_request_sent': None,
	'notifications': None,
	'translator_type': 'regular'
},
'geo': None,
'coordinates': None,
'place': None,
'contributors': None,
'is_quote_status': False,
'retweet_count': 0,
'favorite_count': 0,
'favorited': True,
'retweeted': False,
'lang': 'en'
}
]


While tweets provide a wealth of metadata not all of it will be of use or interest to the researcher, so it was decided to be selective about which to retrieve and to leave out. The programme we created returns only the specific information asked for rather than the full tweet metadata by specifying tweet attributes in the code, which can be modified and expanded easily <cite data-cite="4766306/DTU6X99K"></cite>. Requesting the ‘entities’ attribute returns usernames, user mentions, hashtags, URLs etc. in JSON format that are easily parsed for use later <cite data-cite="4766306/ZTXA7Q3C"></cite>. In this way, some privacy and data protection issues can be minimized at the request level, as well as eliminating unnecessarily cumbersome data that ultimately may be not of interest to the research. This was both pragmatic in terms of the research interests and the management of large amounts of data, but also in reducing superfluous personally identifying data that may constitute a privacy concern<cite data-cite="4766306/5X4D39SC"></cite>. The following is an overview of the key features of the programme.

# Programme design

The [Python](https://www.python.org/) programming language was used to query and process the Twitter API due to the availability of a wide variety of libraries to collect, clean, and process data. Tweets are obtained through queries performed using the *[TwitterAPI](https://github.com/geduldig/TwitterAPI)* library, which encapsulates all the required functionality to request tweets from the Twitter API (This may lend itself to some confusion, since the name of the Python library used to query the Twitter API is *TwitterAPI*. Aside from the spacing difference, references to the Python library will appear in italics).

This library uses a set of access codes generated in the Twitter Developer's portal to authenticate the requests to the Twitter API (`api = TwitterAPI(("XXX", "XXX", "XXX", "XXX", auth_type='oAuth2')`).

Aside from the authentication tokens, it is necessary to create a label to refer to the subscription plan or product contracted in the Twitter Developer portal (`LABEL = 'Pilot'`), as well as the product type. The product type refers to the type of history that can be queried, where the two options at the time of this research were to query tweets from the last 30 days or query the full history of tweets (`PRODUCT = 'fullarchive'` or `PRODUCT = '30day'`).

<!-- @DR COMMENT: I think we ended up using a different library for the Twitter API that is called TwitterAPI.

 Twython and Pandas were first installed. Twython gives back unstructured data that is linear and messy, whereas Pandas returns structured data in tabular form that is easier for manipulation i.e., a CSV file that can be read in Excel. Both packages are necessary to understand the full context of the data.  
The product (‘fullarchive’), the project label (‘Pilot’) and the authorisation keys obtained through the Developers account are inserted. 
-->

In [None]:
from TwitterAPI import TwitterAPI

# Enter unique access tokens generated in Twitter Developers for authentication.
api = TwitterAPI(("XXXXXXXXXXXXXXXXXXXXXXXXXX",
                 "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
                 "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
                 "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"       
                  auth_type='oAuth2')
                 
LABEL = 'Pilot' # Labels created in Developers.
PRODUCT = 'fullarchive' # Use this for full history
# PRODUCT = '30day'#Use this for 30-day archive

The final step in this process was to create and insert a ‘query’ i.e., search terms, using keyword combinations and hashtags to match the request with desired tweets (`SEARCH_TERM = ' #keywords lang:en -is:retweet'`). The structure of the queries used in this research will be described in more detail in the next section.

<!-- @DR
Possibly the most relevant piece of information that is defined in the Python script is the **query** (`SEARCH_TERM = ' #keywords lang:en -is:retweet'`). The query is made up of a set of expressions that are used to filter the types of tweets that are received from the Twitter API. Tweets can be filtered by date ranges, combinations of hashtags, exact matches in the text, the language in which the tweet is written, whether to include retweets in the results, the location that is associated with the tweet, among many other options. The query will be described in more detail in the next section.
-->

In [None]:
# Enter query. Define language. Option to remove retweets at point of retrieval using ‘-is:retweet’statement.
SEARCH_TERM = ' #keywords lang:en -is:retweet'

As stated above, when the programme is run it creates a new .CSV file into which the data is exported, and which can be modified to begin a new dataset as required (e.g., `file_name = 'output.csv'`).

In [None]:
# Filename and structure of the table
file_name = "output.csv"

Next, we created a metadata dictionary. The metadata include the Tweet ID (string of numbers), date (date and exact time stamp of when the Tweet was sent), location (the area the user specifies in their bio, e.g. ‘Co. Dublin’), the Tweet favourite count, the number of retweets, and language, together with the full text of the tweet and additional information about the tweet and the user who published it:

In [None]:
# Define which attribute (i.e. desired metadata) to print in columns. Refer to Tweet metadata template and adjust. Must correlate with for loop (see below).
dict_ = {"request": [], "id": [], "date": [], "text": [], "location": [], "place": [],"coordinates": [], "favorite_count": [], "retweets": [], "language": [], "quote_count": [], "reply_count": [], "in_reply_to_status_id": [], "source": [], "query": [], "entities": []}

These parameters set the column headings within the CSV that is produced, such as the test example below ([figure 3](#figure-2)). The ways in which the tweet metadata retrieved have been structured and filtered necessarily produces a more limited representation of the potential dataset. Just as any visualisations or graphical transformations of the dataset elements will be a mediated representation the ‘dataset is already an extraction from a corpus, text, or aesthetic work, and a remediation’ <cite data-cite="4766306/MVJ4SFZD"></cite>; in other words, as Drucker points out, it is already a ‘derivative’ <cite data-cite="4766306/MVJ4SFZD"></cite>. These representational limitations need to be acknowledged and accounted for in both the collection of data and the inferences we make from it. 

In [2]:
from IPython.display import Image 
metadata={
    "jdh": {
        "module": "object",
        "object": {
            "type":"image",
            "source": [
                "figure 2 : Test example of data returned and tweet attributes structured in CSV format"
            ]
        }
    }
}
Image(url= "media/tweet_attributes_blurred2.jpg", metadata=metadata)

Running the programme repeatedly and specifying the same file continues adding data to this file from the point it finished in each previous request. This is a necessary step due to the limited quotas defined in the Twitter API.

‘Rate limits’ - the number of requests that can be made through an API per unit of time - are much higher with Premium level access <cite data-cite="4766306/XFH6ZQN4"></cite>. There is currently an upper limit of requests of 60 per minute and a maximum number of Tweets per request of 500 for the Premium subscription plan. To maximise the number of tweets obtained per query to the API, 60 requests are automatically submitted each time the programme is run (`NUMBER_OF_REQUESTS = 60`), which amounts to 30,000 tweets. Monthly limits also apply <cite data-cite="4766306/VH75KP48"></cite>. Each new request begins from the last tweet in the previous request without duplicating the last tweet. The first column in the CSV therefore counts the tweets in each request and shows where each new request begins, looping from 499-0 each time. Tweets are returned in reverse chronology, starting from the most recent tweet. 

In [None]:
NUMBER_OF_REQUESTS = 60
is_new_file = True
total_num_tweets = 0

# REQUEST LOOP (WE GENERATE A NEW REQUEST TO THE API EACH TIME)
for i in range(NUMBER_OF_REQUESTS):
    print("Request number ", i)

Most importantly, the Premium API ‘Full Archive’ endpoint allows us to request tweets from a specific time period using the `fromDate` and `toDate` parameters (Note: An ‘endpoint’ refers one end of a communication channel, which moderates the kind of data can be accessed through the API) <cite data-cite="4766306/FZNE756Q"></cite>.

Given that all collected tweets are written to a single CSV file, every new request needs to start at the exact date and time in which the last tweet collected was published. This allows the collection of tweets chronologically and in a way that avoids duplicates and ensures no tweets that match the query are skipped. If the CSV file is empty (`if os.path.exists(file_name)` evaluates for `False`), i.e. when running the very first request, the `toDate` is the parameter that is specified by the user when defining a date range (`date = 'YYYYMMDD0000'`). If the CSV file exists, the publication date and time of the last tweet recorded in the file is read and incorporated into the `toDate` parameter in the query.

If the date of the last tweet extracted is earlier than the `fromDate` specified by the user, that implies there are no additional tweets to extract in that time period. The programme then stops making additional requests (`if int(date) <= int(from_date): break`).

In [None]:
date = 'YYYYMMDD0000'
now = time.strftime('%Y-%m-%d %H:%M', time.gmtime())

if os.path.exists(file_name):
    # CSV file exists, read existing file
    with open(file_name, "r") as file:
        is_new_file = False
        csvfile = csv.DictReader(file)
        for row in reversed(list(csvfile)):
            current_tweet_num = int(row['index']) + 1
            
            # Only read the last date if the last tweet was
            # extracted with the same query as the current one
            if row['query'] == SEARCH_TERM:
                created_at = row['date']
                date = time.strftime('%Y%m%d%H%M',time.strptime(created_at, '%a %b %d %H:%M:%S +0000 %Y'))
            break
else:
    print('Creating new file ', file_name)
    print('TO DATE: ', date)

from_date = 'YYYYMMDD0000'

# This should stop requests when they reach this date
if int(date) <= int(from_date): break

The information described earlier is used to construct a request to the API. `PRODUCT` and `LABEL` information are used to define the database and account, respectively, to direct the request to. After determining *where* to retrieve tweets from, the next step is to determine *which* tweets to retrieve. This is specified by a combination of the search query (`SEARCH_TERM`), the date ranges (`from_date`, `date`), and the maximum number of tweets to retrieve (`500` is used to obtain the largest number of tweets allowed by the API).

The following line of code sends the request to the API and places all of the results that are received in a variable (`r`) for subsequent manipulation.

In [None]:
r = api.request('tweets/search/%s/:%s' % (PRODUCT, LABEL),
                {'query': SEARCH_TERM, 'maxResults':'500', 'fromDate': from_date, 'toDate': date})

The results stored in `r` contain all of the tweets from the request, and we receive all of the publicly available information for every tweet, such as information about location, retweets, number of times it has been marked as favourite, and the metadata of the publishing user. Every tweet is processed inside a for loop, `for item in r:`, that iterates through every tweet stored into `r` and uses `item` as a temporary variable that contains an individual tweet. The `item` variable contains all of the aforementioned information about the individual tweet which it stores in a dictionary format, meaning that every *value* related to that tweet is stored as an *entry* that is associated with a particular *key*. For example, a tweet that has an ID of `24` (in the internal Twitter database) will store the number `24` inside the dictionary entry related to the `id` keyword: `item["id"] = 24`.

There is a substantial amount of information associated to any particular tweet, so we decided to only store the most relevant information for our research goals. The way this is done is by adding the information from the *columns* in the source data (`item`) into our predefined metadata dictionary (`dict_`). Our `dict_` object is not a simple dictionary, but a dictionary of lists, what this means is that, for every key, we keep a list of values, one for each tweet processed. In this sense, the dictionary entry for tweet IDs would look something like this: `dict_["id"] = [24, 12, 153, 221, 9, ...]`. Adding a piece of information from the source to our dictionary is therefore not a simple value assignment, but rather amounts to *appending* a value to a list. For the tweet ID, this is done in the following line of code: `dict_["id"].append(item["id"])`.

Many of the fields only need to be copied over to our dictionary. Other fields, however, need to undergo additional operations to get the data in the right format for subsequent storage. Text fields, such as the text of the tweet or the location, need to be encoded into UTF-8 before we save them to our CSV file `text.encode(encoding='utf-8', errors='ignore')`. Certain fields contain additional structured information in [JavaScript Object Notation (JSON)](https://www.json.org/json-en.html) format, so in these cases the data needs to be *parsed* so it can be stored in a usable format: `json.dumps(item["entities"])`.

We store information about the requests from which every tweet came, such as the number, time, and date of the request (`str(i+1)+' - '+now`), and the query that was used to obtain them (`SEARCH_TERM`).

Additionally, we wrote in an option to anonymise all user handles that appear in the text of a tweet (`re.sub(r"(?:\@)\S+", "@USER", item["text"])`), although we are not currently using that option (the line is commented out in the code).

<!--
When the metadata parameters i.e., columns, are modified, we must also modify the ‘for’ loop, which allows for correct iteration and appending of data, to match the new requirements, e.g.: 
-->

In [None]:
# This loops through all the tweets in r
for item in r:
#     dict_["index"].append((current_tweet_num + num_tweets))
    dict_["request"].append(str(i+1)+' - '+now)
    dict_["id"].append(item["id"])
    dict_["date"].append(item["created_at"])
    # text = re.sub(r"(?:\@)\S+", "@USER", item["text"])
    text = item["text"]
    text = text.encode(encoding='utf-8', errors='ignore')
    dict_["text"].append(str(text))
    dict_["location"].append(str(item["user"]["location"]).encode(encoding='utf-8', errors='ignore'))
    dict_["place"].append(json.dumps(item["place"]))
    dict_["coordinates"].append(json.dumps(item["coordinates"]))
    dict_["favorite_count"].append(item["favorite_count"])
    dict_["retweets"].append(item["retweet_count"])
    dict_["language"].append(item["lang"])
    dict_["quote_count"].append(item["quote_count"])
    dict_["reply_count"].append(item["reply_count"])
    dict_["in_reply_to_status_id"].append(item["in_reply_to_status_id"])
    dict_["source"].append(item["source"])
    dict_["query"].append(SEARCH_TERM)
    dict_["entities"].append(json.dumps(item["entities"]))
    num_tweets += 1

After iterating through all of the tweets returned by the request and storing them in our predefined dictionary they are converted into a [pandas](https://pandas.pydata.org/) *DataFrame*, a tabular format where every tweet corresponds to a row, and every field defined in the vocabulary is mapped to a column. Additionally, an index column is added to the DataFrame to keep an internal consecutive identifier for the tweets retrieved (`pd.DataFrame(dict_, index=indices)`). The completed DataFrame is then saved to the pre-specified CSV file.

In [None]:
# Structure data in a pandas DataFrame for easier manipulation
indices = list(reversed(range(current_tweet_num,(current_tweet_num+num_tweets))))

df = pd.DataFrame(dict_, index=indices)
df.index.name = 'index'

with open(file_name, "a") as f:
    df.to_csv(f, header=is_new_file, encoding='utf-8')

## Query Design

Boolean search queries were created for requesting Twitter data around distinct topics of commemoration. With a character limit of 1,024 these queries consist of a mixture of hashtags, keywords and keyword combinations using rules such as the ‘OR’ statement, double quote enclosing (exact match) and parentheses to structure more complex combinations. These were arrived at using the ‘snowballing’ technique to gather relevant hashtags and keywords via the Twitter interface. Unlike phnomena such as #Brexit and #Covid19, the Irish commemorations have generated a much small number of tweets that could be retrieved using a few hashtags alone. A broad query base using keyword and hashtag combinations was required to capture as many tweets as possible. 

A filtering statement was used to remove a ‘bot’ identified in the testing stages using the ‘-from:userhandle’ statement was necessary after the identification of a ‘bot’ in the testing stages whose automated tweets visibly skewed the data. Spam and link-baiting may occur around highly-used hashtags <cite data-cite="4766306/5X4D39SC"></cite> and a case of this was identified. The ‘bot’ in question had repeated hundreds of variations of the same tweet based on a line of the 1916 Proclamation of the Irish Republic throughout the 2016 year. It was possible to filter out at the point of retrieval by altering the queries thereafter (albeit eating into the query character limit). However, these kinds of automated accounts can interfere with the veracity of analyses and results, as was observed when a simple word cloud and word frequency analysis was carried out using Voyant Tools <cite data-cite="4766306/IN2M9BSH"></cite>, which can be installed and run locally and modified to secure the data on the local server. English was specified for consistency (‘lang:en’). Relevant Irish language tweets were therefore not captured in this study, however, hashtags using Irish language e.g., #Mná1916 attached to English-language tweets were. The following query was constructed to create the ‘Women of 1916’ dataset from 1 August 2015 to 1 January 2017:

In [None]:
SEARCH_TERM = '(#Womenof1916 OR #woman0f1916 OR #WomenIn1916 OR #CumannNamBan OR #Mná1916 OR #Women1916 OR #womenoftherising OR ((women OR woman OR #irishwmnhist) ("Easter Rising" OR "1916 Rising")) OR "cumann na mban" OR "Inghinidhe na hEireann" OR "women of 1916" OR (((women OR woman OR #irishwmnhist) (commemoration OR centenary))(1916 OR "Easter Rising")) OR ((#wakingthefeminists OR #WTFeminists) (1916 OR #1916rising OR "easter rising" OR #ireland2016 OR #1916centenary OR #irishwmnhist OR )) OR (1916 women (feminism OR #internationalwomensday OR #IWD2016 OR equality)) OR (markievicz (constance OR countess)) OR "Margaret Skinnider" OR "Helena Moloney" OR "Elizabeth O\'Farrell" OR "Grace Gifford" OR "Winifred Carney" OR (Rosie Hackett 1916) OR "Rosamond Jacob" OR (irish women revolutionaries) OR (Irish woman revolutionary) OR ((woman OR woman) Irish Citizen army) OR "Hanna Sheehy-Skeffington" OR "Kathleen Clarke" OR "widows of 1916" OR (#Revolutionarywomen (1916 OR "easter rising"))) lang:en -from:xxxxxxxxxxxx'


While the Twitter Developers ‘Sandbox’ (see below) was used mainly to test and refine the functionality of the Python programme (fig. 4) using only two or three hashtags at a time e.g. #Dáil100, the public Twitter search function also has much utility in creating and refining complex search queries. As these queries can become very complex with the use of multiple rules and combinations, the public search tool was used to simultaneously build and test the queries before implementing them. Although the character limit is smaller, this allowed us to check that queries, or sections of queries, functioned correctly and returned the expected results, somewhat circumventing the limits of the Sandbox. The Twitter search function recognises the same rules and allows us to check that combinations are correct and the ‘advanced’ search option permits the exploration of search terms and historical tweets within certain date parameters.

In [3]:
from IPython.display import Image 
metadata={
    "jdh": {
        "module": "object",
        "object": {
            "type":"image",
            "source": [
                "figure 4 : Sandbox usage. Twitter Developers"
            ]
        }
    }
}
Image(url= "media/Dashboard_sandbox usage 2019-01-25 at 14.55.48.jpg", metadata=metadata)

The queries for this data collection were designed for feminist commemoration as a significant focus; other researchers might create different sets of search parameters around the same centenary and the same issue, and create several different datasets. Search queries are, in this way, somewhat idiosyncratic. And while the ability to share datasets in the form of Tweet IDs is undoubtedly beneficial, they cannot be altered to include missing hashtags or keywords deemed essential by another researcher. No such query-specific collection of tweets will be complete or wholly representative. Providing lists of hashtags and keywords used to create such datasets is important for understanding the scope of data retrieved as well as illustrating the limitations of thematic query-based datasets. 

# Data pre-processing and analysis

### Data Cleaning

The tweet statement was the primary vector of analysis in this study and the most important units of data were therefore text, user mentions and hashtags, and to a lesser extent URLs, and emojis. To be cognisant of Twitter as a medium we must account for the embeddedness of such supra-textual features as hashtags, or ‘natively digital objects’ <cite data-cite="4766306/FHW4H34J"></cite>, which are also user-generated by design and therefore more meaningful than simple tags, and which constitute a Tweet as a message. Different permutations of the cleaning process are possible, though removing URLs, retweets, and other ‘noise’ in the text take priority. The following Python programme written by Author 2 can remove HTML, remove or retain retweets, URLs, hashtags and user mentions as required and writes the ‘cleaned’ data to a new file. This text cleaning script is structured like a waterfall, where the result from the first stage of cleaning (`result = text.replace('\n', ' ')`) is fed into the second stage and stored in the same variable (`result`), and this is repeated for every stage of the cleaning process.

In [None]:
# Regular expression library
import re
# Library to deal with HTML tags
import html

def clean_text(text):
    # Remove line break characters
    result = text.replace('\n', ' ')
    # Remove URLs
    result = re.sub(r"http\S+", ' ', result)
    # Remove hashtags
    result = re.sub(r"#\S+", ' ', result)
    # Remove user handles
    result = re.sub(r"@\S+", ' ', result) #comment out each as needed
    # Unescape HTML tags
    result = html.unescape(result)
    # Return the fully processed text
    return result


Emojis are common in tweets and presented as ‘encoding failures’ <cite data-cite="4766306/BX8IMUQP"></cite> in the cleaned text.  This was dealt with by modifying the encoding format from ‘utf-8’ to ‘utf-8-sig’ for both reading and saving all new CSV files (e.g. `open(file_name, 'r', encoding='utf-8-sig')`).

The following Python script reads in a CSV file containing the extracted tweets (`with open(file_name, 'r', encoding=...`), reads in the file's header (`header = next(data)`) and structures it into a dictionary (`cols = {col_name: col_num ...`), and goes through the contents of the file, one row (i.e. tweet) at a time (`for row in data:`). A check is performed inside this loop to remove retweets from the final results by skipping any tweet that starts with 'RT' (`if row[cols['text']][2:5] != 'RT ':`). For every tweet in the CSV file, this script performs a text decoding step (`text = ast.literal_eval(...`), and subsequently cleans the text with the cleaning function that was just described (`text = clean_text(text)`). Note that this stage is only performed on the `text` and `location` columns of the tweet, since this is not necessary for the rest of the data. The decoded and cleaned tweet is then appended to a list (`decoded_row.append(text)`) which is in turn used to write all of the decoded and cleaned tweets to a new CSV file (`writer.writerow(decoded_row)`).


In [None]:
# CSV read/write library
import csv
# Abstract Syntax Trees library
# (something that just works)
import ast

# Opens the file in variable file_name
with open(file_name, 'r', encoding='utf-8-sig') as f, \
    open(save_file, 'w+', encoding='utf-8-sig') as s: #opens new file into which cleaned text will be written
    # Reads the file (f) as a CSV file
    data = csv.reader(f)
    writer = csv.writer(s)

    # Read the first row of the CSV file
    header = next(data)
    writer.writerow(header)

    # Create a dictionary of column names:
    # 1. Goes through every element in the header list
    # 2. Enumerates all of the elements in the list (e.g. 0-'index', 1-'request', etc.)
    # 3. Creates a dictionary where the column name is the key and the column number is the value
    # With this dictionary we can ensure we always get the right column number for a given column name
    cols = {col_name: col_num for col_num, col_name in enumerate(header)}
    
    # We cycle through every row in our CSV file
    for row in data:
        decoded_row = []

        # CODE TO IGNORE RETWEETS
        if row[cols['text']][2:5] != 'RT ':
            for col_name in cols.keys():
                if col_name != 'text' and col_name != 'location':
                    decoded_row.append(row[cols[col_name]])
                else:
                    # Get the column number for 'text'
                    text_col_num = cols[col_name]

                    # Print the contents of the column for text
                    text = ast.literal_eval(row[text_col_num]).decode(encoding='utf-8-sig', errors='ignore')
                    text = clean_text(text)
                    decoded_row.append(text)

            writer.writerow(decoded_row)

### Data Parsing

A second programme was written to transform the cleaned data into a manageable format for analysis. Making use of the ‘entities’ collected for each tweet, user mentions, hashtags and URLs were parsed and tabulated (fig 5.). Entities are presented in JSON format (linear) in the dataset and simplify the work of extracting items of interest such as hashtags as they are in a way pre-processed. In other words, they provide ‘metadata and additional contextual information about content posted on Twitter’ <cite data-cite="4766306/ZTXA7Q3C"></cite> as a ‘series of defined attributes and values,’ <cite data-cite="4766306/RD7VKCE5"></cite> e.g.:  

In [None]:
{"hashtags": [{"text": "insert_hashtag", "indices": [00, 00]}], "urls": ["https://t.co?xxxxxxxxxx", "expanded_url": "https://twitter.com/i/web/status/xxxxxxxxxxxxx"], "user_mentions": [{"screen_name": "insert_screenname", "name": "insert_name", "id": 000000000, "id_str": "000000000", "indices": [00, 00]}], "symbols": []}

Entities may also include information about media types, urls, and descriptions (e.g., image dimensions), known as ‘objects.’ This parsing exercise is useful for aggregating certain types of data within tweets, as well as a critical step in coding tweets qualitatively. The date-time format was also modified to DD-MM-YYYY. 

In [None]:
import csv

# Regular expression library
import re

# Something that just works
import ast
import html

# Parse JSON objects (for entities)
import json

from dateutil.parser import parse as dateparse

file_name = 'fullarchive.csv'
save_file = 'decoded_fullarchive.csv' #create file where data will be copied
print('Reading file: ', file_name)

# Opens the file in variable file_name
with open(file_name, 'r', encoding='utf-8-sig') as f, \
    open(save_file, 'w+', encoding='utf-8-sig') as s:
    # Reads the file (f) as a CSV file
    data = csv.reader(f)
    writer = csv.writer(s)

    header = next(data)

    cols = {col_name: col_num for col_num, col_name in enumerate(header)}

    new_header = ['date', 'time', 'screen_name', 'text', 'mentions', 'hashtags', 'location', 'favorite_count', 'retweets', 'urls','entities']
    new_dict = {col_name: '' for col_num, col_name in enumerate(new_header)}

    i = 0

    writer.writerow(new_header)

    # We cycle through every row in our CSV file
    for row in data:

        new_row = []

        if True: # filler code, comment it if removing RTs

            entities = json.loads(row[cols['entities']])
            hashtags = ', '.join([hashtag['text'] for hashtag in entities['hashtags']])
            print(hashtags)


            entities = json.loads(row[cols['entities']])
            mentions = ', '.join([mention['screen_name'] for mention in entities['user_mentions']])
            print(mentions)

            entities = json.loads(row[cols['entities']])
            urls = ', '.join([url['expanded_url'] for url in entities['urls']])
            print(urls)

            datetime = dateparse(row[cols['date']])
            date = datetime.date()
            time = datetime.time()

            new_dict['date'] = str(date)
            new_dict['time'] = str(time)
            new_dict['screen_name'] = row[cols['screen_name']]
            new_dict['text'] = row[cols['text']]
            new_dict['mentions'] = mentions
            new_dict['hashtags'] = hashtags
            new_dict['location'] = row[cols['location']]
            new_dict['favorite_count'] = row[cols['favorite_count']]
            new_dict['retweets'] = row[cols['retweets']]
            new_dict['urls'] = urls
            new_dict['entities'] = row[cols['entities']]

            for col_name in new_header:
                new_row.append(new_dict[col_name])

            print(new_row)

            i += 1

            writer.writerow(new_row)

The output of this processing is a streamlined tabulation of the data:

In [4]:
from IPython.display import Image 
metadata={
    "jdh": {
        "module": "object",
        "object": {
            "type":"image",
            "source": [
                "figure 5 : Parsed entities and reformatted data"
            ]
        }
    }
}
Image(url= "media/Tweet_attributes_parsed_screenshot_blurred.jpg", metadata=metadata)

Below is a summary of the corpus of tweets. The smaller dataset was the object of the qualitative thematic analysis:

|DATE RANGE||DESCRIPTION||TOTAL TWEETS||RETWEETS REMOVED|
|----||----||----||----|
|1 Jan-31 Dec 2016||Easter Rising 2016||399,205||139,809|
|1 Aug 2015-31 Dec 2016||Women of 1916||45,564||10,981|


## Qualitative thematic analysis

This analysis of tweets was entered into less with presumptions of ‘discovering’ something that would have been impossible through a traditional reading and more with the expectation that it would confirm or substantiate certain hunches and generate new ways of thinking about and interpreting them <cite data-cite="4766306/FA2EXIT3"></cite>. Drucker has convincingly argued that automated methods and visualisations are at their core unnatural to humanistic inquiry <cite data-cite="4766306/MVJ4SFZD"></cite>. Any analysis of Twitter data must be prefaced by critical reflection on the ways in which meaning can be derived from computer-assisted methods. Automation of certain tasks allows us to probe and get to know the data in more focused ways. What the results of these automations and visualisations mean, however, is not self-evident, and it is up to the researcher to contextualise, interpret, and map them against other data and analyses: ‘We use tools not to get results but to generate questions… [computers] do not produce meaning - we do’ <cite data-cite="4766306/FA2EXIT3"></cite>. In this sense, Sinclair and Rockwell have suggested that text analysis tools are ‘Hermeneutica’ - interpretive aids for ‘thinking through’ data, and which can ‘help us try to formalize claims and to test them’ <cite data-cite="4766306/BX8IMUQP"></cite>. Further, any visualisations presented here are intended as storytelling aids and are not literal representations of reality. Word and hashtag frequencies provide a starting point to navigate the data in search for the most salient patterns, such as recurring themes or term co-occurrences. As a second level of analysis, the collected tweets are used to construct a network of interconnected terms (words or hashtags), and the visualisation of the connections in these graphs can evidence underlying clusters of (sometimes seemingly unrelated) terms. As Boyd and Crawford have outlined, ‘big-data’ research is, from the formulation of a hypothesis, an interpretive process <cite data-cite="4766306/CQMSBDX8"></cite>, as demonstrated above in the retrieving of the dataset itself.  Though not small enough as to allow for purely qualitative methods, this was, in the scheme of things, a little data study. A sample of 10,000 Tweets, after the removal of retweets, proved an ideal number. Too small and (remembering the repetitive nature of commemorative tweets) we may struggle to identify and demonstrate meaningful findings. Too large and the challenge of close reading becomes proportionately more difficult. Software and computing power may also fail in the attempt to work with very large datasets, which may be somewhat mitigated by using a more powerful (and secure) university server.

The Twitter data collected was coded using qualitative thematic analysis aided by NVivo software. NVivo is a ‘qualitative computing’ <cite data-cite="4766306/WFAGWM85"></cite> software package that supports exploration, organisation, annotation, indexing, and coding of qualitative data. It is commonly used, for example, to manage and code interview transcripts, though tweets can also be collected through NVivo directly using the NCapture function. A body of tweets does not, however, present a linear narrative that can be inspected in the same way as interview transcripts or policy documents. Word frequencies and text queries were used as a springboard to thematic coding. These were saved iteratively as nodes and sub-nodes i.e., coded in order to study the ‘key word in context,’ <cite data-cite="4766306/BX8IMUQP"></cite> some tweets added *ad hoc* as appropriate. The initial coding was inductive and code labels assigned *in vivo*, which were added to, revised, and rearranged into themes and sub-themes, eventually amalgamating into five major areas of interest. Inseparable from the theoretically driven research questions and the philosophical assumptions of the researcher, coding became an increasingly interpretive method as the organisation of data progressed. Data, after all, are not ‘coded in an epistemological vacuum’ <cite data-cite="4766306/P8J7IRXJ"></cite>. Additionally, as tweets are less dynamic in content than free flowing speech or literature, the initial coding may not necessarily result in the extremely large numbers of nodes that often represent the first layer of interpretation of spoken word data or reams of policy documents, and this will further depend on the point at which we reach saturation. It is not claimed here that every possible detail was gleaned from this handling of the dataset, nor that every single tweet was coded. Rather, a point was reached when little new was being found or adding to the nodes significantly. With the reiteration of these nodes and sub-nodes, note-taking and aggregating, the narrative began to take a clear shape.

Myers and Hamilton caution that a greater historiographical awareness of social media as primary sources must include the ways in which social media are narrativized (teleological, technological determinism, technological dystopia), but also ‘how they themselves narrativize and thus help produce the processes of interest’ <cite data-cite="4766306/ZEZX8GEI"></cite>. This has implications for how we understand the effectivity of such platforms when we study their use in commemoration. Twitter is a highly performative space comprising multiple tensions and contingent on the conventions of its use (character limits, hashtags, likes, retweets, media sharing, etc): ‘Platform characteristics are critical for understanding how users create, share, interact with, and mobilize content as well as for understanding how community is created and maintained in different platforms’ <cite data-cite="4766306/IVQEWRIX"></cite>. Commensurate with these conventions and culture of use, Twitter is also a reductive medium through which very short statements attempt to convey a much more complex meaning and reality, thus highly susceptible to lack of nuance. Clark-Parsons reminds of how hashtags may overly simplify and even undermine structural change; Tweets, as much as the hashtags they produce, ‘trade on short but compelling narratives’ <cite data-cite="4766306/XAZ4565X"></cite>. This centennial corpus originated from pre-280 Twitter, with even less room for complexity, and narrative derived from these ‘texts’ must be tempered by such limitations. The potency of such messages is nonetheless valid when we read them with a cognisance of the boundaries and culture of the medium and situate them in their broader social and historical contexts.

# Findings

Twitter was, in 2016, a space in which historical feminism was being expressed through commemoration, achieved by engaging in a politics of representation and critical remembrance online. This exercise in historical visibility and discourse of liberation traversed temporalities of feminism in this the women of 1916 ‘issue space’ <cite data-cite="4766306/FHW4H34J"></cite>. Below is a keyword summary of the major themes and top hashtags and word frequencies:

|Historical Information || Centenary Commentary |
|----||----|
|factual || commemoration|
|quotations || celebration|
|live-tweeting ||remembering|
|stories       ||memory|
|GLAMs         ||events|
|role of women ||recognition|

|Absence||Affect||Equality|
|----||----||----|
|airbrushing||pride||feminism|
|silencing||honour||gender|
|erasure||legacy||fighting|
|forgotten||bravery/heroism||freedom/liberation|
|recognition||inspiration||the proclamation|

In [5]:
from IPython.display import display

display({
  "tags": [
    "table-2",
    "full-width"
  ],
  "jdh": {
    "object": {
      "bootstrapColumLayout": {
        "md": { "span": 12 }  
      },
      "source": ["table 1 : Thematic coding"]
    }
  }
})

{'tags': ['table-2', 'full-width'],
 'jdh': {'object': {'bootstrapColumLayout': {'md': {'span': 12}},
   'source': ['table 1 : Thematic coding']}}}

|Term||Count|		|Word||Count|
|----||----|		|----||----|
|#womenof1916||1070|		|women||4750|
|#easterrising||413|		|1916||3480|
|#ireland2016||247|		|rising||2389|
|#1916rising||245|		|markievicz||2050|
|#internationalwomensday||233|		|easter||1808|
|#iwd2016||188|		|irish||1441|
|#wakingthefeminists||172|		|countess||1273|
|#ireland||138|		|cumann||945|
|#rebellion||110|		|constance||927|
|#women||106|		|mban||926|
|#cumannnamban||101|		|men||730|
|#irish||90|		|years||704|
|#irishhistory||87|		|via||702|
|#birthofanation||79|		|woman||698|
|#onthisday||79|		|grace||688|
|#womenoftherising||78|		|100||674|
|#dublin||69|		|still||600|
|#markievicz||65|		|fighting||552|
|#easter1916||55|		|gifford||538|
|#history||54|		|ireland||493|
|#iwd16ni||50|		|margaret||449|
|#women1916||49|		|today||419|
|#otd||48|		|skinnider||418|
|#feminism||46|		|olivia||414|
|#1916live||44|		|plunkett||404|
|#womenin1916||44|		|great||403|
|#repealthe8th||40|		|dublin||352|
|#rising||37|		|day||349|
|#rte1916||36|		|O’leary||346|



In [6]:
from IPython.display import display

display({
  "tags": [
    "table-2",
    "full-width"
  ],
  "jdh": {
    "object": {
      "bootstrapColumLayout": {
        "md": { "span": 12 }  
      },
      "source": ["table 2 : Hashtag and word frequencies"]
    }
  }
})

{'tags': ['table-2', 'full-width'],
 'jdh': {'object': {'bootstrapColumLayout': {'md': {'span': 12}},
   'source': ['table 2 : Hashtag and word frequencies']}}}

The conventions of historical commemoration are represented in the two major themes that emerged: ‘Historical Information’ and ‘Centenary Commentary,’ and these account for the larger number of the tweets. Together these make for a preponderance of official and organised tweets - information, event and news sharing - and significant repetition even with retweets systematically removed. Link sharing direct from news sources, online petitions, and automated duplication of tweets create repetitions, but often contain supplementary text or hashtags of interest. Firstly, the repetition of historical facts and statements that characterise commemorative and historical Twitter represents the scale of engagement with the commemorations, and with particular historical phenomena. These statements, and ‘invitations to remember’ <cite data-cite="4766306/8CHKQ2KQ"></cite> are also part of the ‘ritual discourse’ of commemoration online, and are constitutive of the tensions between remembering and forgetting.  Further to this, ‘Absence’ – critique of historical and continued erasure of women from history - ‘Affect’ – expressing a relationship with the past through expositions of pride, honour, inspiration - and ‘Equality’ - critique of gender inequality in the present through the prism of commemoration - emerged as substantially recurrent concepts, intertwined with ritual commemorative commentary and historical transmission (For an extended discussion of ‘Equality’ see: Author 1, XXX (Forthcoming 2022)). The following details some of the findings from the coding and analysis of this issue space, focusing on historical and commemorative commentary. Further to this, the example of International Women’s Day when the official state ceremony to commemorate the women of 1916 was held in 2016 captures the affective capital of commemoration, critique of historical absence, and a discourse of liberation in the present in this commemorative space. 

## Historical Information & Centenary Commentary

Twitter is, above all, a platform for information transmission <cite data-cite="4766306/IVQEWRIX"></cite>.  Libraries, Archives, Museums and Galleries (GLAMs), events, exhibitions, collections or the letters, diaries, records and photographs of particular women, as well as academic research, books, conferences, lectures, and historical documentaries were prevalent in this space.  Historical information was, unsurprisingly, key to this remembrance process and in turn making women visible in this politics of information. This pivoted on the *role of women* in the Easter Rising, highlighting their activism and contributions through historical statements and commenting on the memory politics of the past and included factual Tweets, historical quotations, or ‘live-tweeting’ historical moments:

> @CenturyIRL: Elizabeth O’Farrell, nurse and Volunteer, leaves #MooreSt carrying a white surrender flag #1916LIVE

Information about Cumann na mBan (Women’s League) and female rebels and revolutionaries constitutes much of these tweets with an emphasis on telling their stories and commemorative cultural productions inspired by them. The suffragism of many of these women also feature, some tweets looking towards the 2018 centenary of partial women’s suffrage in terms of how it would be adequately commemorated. The complexity of women’s involvement in Easter Week 1916 is necessarily lost in the reductive nature of Tweets, and indeed we cannot expect such complexity of this medium. Of greater interest is how people engaged with and understood these historical roles and their consequences for the course of Irish history and gender politics up to the present day, and how they expressed these meanings through the medium of Twitter. Many tweets characterise the contributions of women as ‘brave’, ‘vital’, ‘key’, ‘integral’, ‘critical’, ‘central’ and so on. A distinct tension emerged between the ‘forgotten’, ‘untold’, ‘ignored’, ‘unknown’ or ‘hidden’, and the remembering, recognition, retelling, celebration and reclaiming of women’s roles and stories. This has the dual function of public history-making and critical commentary <cite data-cite="4766306/8NUABFVV"></cite>, making visible both the politics of the past and that which was marginalised.  

In [7]:
from IPython.display import Image 
metadata={
    "jdh": {
        "module": "object",
        "object": {
            "type":"image",
            "source": [
                "figure 6: Hashtag co-occurence clusters: #womenof1916. Generated in Gephi using a modularity algorithm"
            ]
        }
    }
}
Image(url= "media/Picture 1.jpg", metadata=metadata)

In [8]:
from IPython.display import Image 
metadata={
    "jdh": {
        "module": "object",
        "object": {
            "type":"image",
            "source": [
                "figure 7: Hashtag co-occurence clusters: #EasterRising. Generated in Gephi using a modularity algorithm"
            ]
        }
    }
}
Image(url= "media/Picture 2.jpg", metadata=metadata)

‘Centenary commentary’ can be divided between the subset of themes ‘commemoration’, ‘celebration’, and ‘remembering’ and in each case invitations and obligations to remember, commemorate and celebrate are salient. Commemoration, broadest of these, includes information, announcements about, and invitations to, events such as for International Women’s Day, commemorative ceremonies such as wreath-laying and plaque dedications, commemorative campaigns, craft, theatre and musical performances - all of which pay homage to, highlight and explore the role of women in the 1916 Rising. Pride, honour and concomitant expressions of emotion and overwhelmingly positive sentiment towards the commemorations and related cultural productions are also a feature of this sub-theme. Cultural productions such as theatre and musical performances as modes of remembering, celebrating, and commemorating individual women are similarly well received in this issue-space. One exception was the TV mini-series *Rebellion*, aired between February and March 2016, which dramatized the Rising largely from the point of view of three fictional women. This received a more ambivalent reception, with some tweets appreciative of the female perspective yet critical of the somewhat ironic exclusion of nurse Elizabeth O’Farrell, famously ‘airbrushed’ from a *Daily Sketch* photograph of the surrender scene in a report on the insurrection in 1916. This ‘airbrushing’ (somewhat contested) has become emblematic of the side-lining of women in Irish history, and was prevalent in the discourse of absence in this issue space.

Also included is commentary on the inclusiveness of the centenary programme, comparisons with the position of women in the 1966 fiftieth anniversary commemorations, and repeated declarations that ‘finally’ these historical women and their roles in the foundation of the State were being acknowledged. A sense that women had at long last become ‘worthy’ or deserving of recognition also emerges in relation to this unprecedented public and commemorative attention. 

> @ireland: The #womenof1916 is a central theme of #Ireland2016 commemorations.

However much the Expert Advisory Group on the commemorations insisted against celebratory notions of official remembrance, haunted by the excesses of 1966 that were retrospectively blamed for stoking the outbreak of the northern 'Troubles,' the centenary as a ‘celebration’ was apparent in the prevalence of the term in this issue space, as well as the larger dataset. Mainly, these statements were concerned with celebrating particular women, such as Constance Markievicz, Cumann na mBan, the role of women in the Rising more generally, their bravery and legacies, and events dedicated to their valorisation.

> @ireland2016: This year we remembered the bravery and idealism of the women of 1916 and honoured the women of today.

Many declared their remembrance of the courage and sacrifices of women and men involved in the Rising, describing them variously as heroes, patriots, courageous, and others still refer to honouring their memory, and paying tribute or homage to their memory. Recognition is again expressed as something long-awaited and justified. Still other tweets reminded that we must remember the issues that women face *today* even as we commemorate 1916 and its female protagonists, that indeed these contemporary problems are linked with the need to engage in critical, feminist remembrance. Several tweets asked that we recall not just women’s role in 1916, but equally their legacies of gender equality work. Commemoration, after all, is a present-centred meaning-making process and the stories of the women of 1916 became the narrative that, for many, spoke ‘more directly to latter-day concerns and are more relevant to latter-day identity formations’ <cite data-cite="4766306/Z52TT2VH"></cite>. As seen in the hashtag collocations below, #Repealthe8th (referring to the then ongoing abortion rights campaign) and #WakingTheFeminists (a movement sparked by gender discrimination in the National Theatre’s programme for the 1916 centenary), feature in this space. Abortion rights activists in turn invoked the feminist ghost of the past in their annual ‘March for Choice’ in September 2016, which reimagined historical imagery and referenced female revolutionaries and the egalitarian language of the 1916 Proclamation under the banner ‘Rise and Repeal’ <cite data-cite="4766306/4QUIXFNN"></cite>.

## International Women’s Day

In the theme ‘Equality’ was a cluster of tweets in reference to International Women’s Day (IWD) 8 March 2016, where a critique of gender equality in Ireland continued to be the subtext to both official and unofficial remembrance. IWD (the second largest peak in fig. 8 below), was appropriated for the official state ceremony to commemorate the role of women in the 1916 Rising, prior to the main centenary event on Easter Sunday, evident in the second largest peak of total tweets in this dataset. IWD is itself an internationally recognised day to commemorate women globally, and in this issue space it was a platform to make women (historical and contemporary) visible and express affective communion with women of the past and a discourse of liberation in the present.

In [10]:
![Picture%201.png](attachment:Picture%201.png)
from IPython.display import Image 
metadata={
    "jdh": {
        "module": "object",
        "object": {
            "type":"image",
            "source": [
                "figure 8: Total Tweets ‘Women of 1916’ (c. 10k original Tweets i.e., retweets removed) Aug 2015 - Dec 2016"
            ]
        }
    }
}
Image(url= "media/total-tweets-womenof1916.jpg", metadata=metadata)

fish: Command substitutions not allowed
[Picture%201.png](attachment:Picture%201.png)
                 ^


Figures 9 and 10 below show clusters of hashtags around the topic of IWD, and to a lesser extent ‘Proclamation Day,’ with hashtags like #womenofcourage, #inclusion, #hero, #equality4all, #genderequity, #inspiringwomen, #theproclamation, #1916women to name a few. Figure 10 shows how keywords like ‘honouring,’ ‘tribute,’ ‘legacy,’ ‘equality,’ ‘inspirational,’ ‘proud,’ ‘celebrating,’ and ‘heroes’ appear in collocation with ‘international women’s day.’

In [11]:
from IPython.display import Image 
metadata={
    "jdh": {
        "module": "object",
        "object": {
            "type":"image",
            "source": [
                "figure 9: Sections of hashtag co-occurrences showing IWD clusterings"
            ]
        }
    }
}
Image(url= "media/Picture 3.jpg", metadata=metadata)


In [12]:
from IPython.display import Image 
metadata={
    "jdh": {
        "module": "object",
        "object": {
            "type":"image",
            "source": [
                "figure 10: Sections of hashtag co-occurrences showing IWD clusterings"
            ]
        }
    }
}
Image(url= "media/Picture 4.jpg", metadata=metadata)


A State event was held at Kilmainham Hospital, Dublin, on IWD 2016. Passages from the keynote speech made by the President of Ireland, Michael D. Higgins, at the ceremony were referenced in many tweets, passages that weigh in on remembering, forgetting, and the position of women in Irish society past and present e.g.:

> …those who were long described as ‘the forgotten women of 1916’ are not forgotten any more... we reflect, together, on all that remains to be done if we are to live up to the dreams of equality and justice that animated those women from our past <cite data-cite="4766306/HCCGEC7X"></cite>.

The tone of Tweets surrounding IWD are celebratory, focusing on the legacy of the women insurgents, underscoring their role as ‘key’, ‘significant’ and ‘pivotal’, and expressing inspiration, pride and solidarity in celebrating and paying tribute, their ‘bravery and idealism’ re-emphasised. IWD events and cultural productions are described variously as fitting, beautiful, and perfect homages. Acknowledgement, recognition, and the imperative to commemorate and remember again characterised these Tweets. Continuity between 1916 and 2016 was more explicit as with IWD there is a celebratory discourse directed towards women of the past as well as the women of today. Further to this was acknowledgement of the ways in which the women of the past and their roles in history have impacted the Irish nation today, the status of women in it, and individual sense of identity. Some tweets assert the women of 1916 and Irish women in history as shapers of the nation and national identity. Equally, there was recognition of how much work is needed still to achieve equality in the present. Many tweets, for example, pointed to the slow progress of representation in Irish politics, gender quotas having been recently introduced in electoral nominations in 2016 in time for the February general election. To a much greater extent, this was to become the backdrop to the 2018 centenary of partial female suffrage <cite data-cite="4766306/QXWRKCQZ"></cite>.  

## The anti-feminist backlash?

What was not readily apparent in this tweet data was evidence of the kind of anti-feminist or misogynistic discourse that is often expected in such research and of social media forums, and for which Twitter is a ready petri dish. Banet-Weiser has considered this intractable duality of ‘popular feminism’ and what she calls ‘popular misogyny,’ a ‘defence against feminism and its putative gains,’ as they play out in multiple media settings including social media and comment-enabled platforms <cite data-cite="4766306/VPHIC8ET"></cite>. The answer in respect of this study has to do with the data source and the limitations of the API-based retrieval methods. At the time of collecting the tweet data in 2019, it was not possible to capture a cascade of full-text tweet replies or Tweet IDs through the Premium API service, which is where we would expect to find this type of discourse. It is another problem that has been faced by hate speech researchers, for example, when attempting to scale-up research methodologies and data collection from phenomena observed in-platform. Since this data was collected, new endpoint features have been added including a ‘conversation_id’ parameter to better track these threads. 

As outlined above, no query can capture every relevant tweet, nor return a fully representative dataset. Another partial explanation, therefore, is that those expressing opposition to this feminist remembrance politics simply did not engage in the hashtags or use keywords or phrases that matched with the query used to collect this dataset. Whether a deliberate eschewing of hashtags - which are after all designed to connect and make visible - or otherwise identifying language, can only be speculated upon. However, it raises questions for future re-analysis and methodological development. In this sense, the dataset and analysis are based in the ‘hashtag women of 1916’ narrative on Twitter primarily from the perspective of its participants, itself a feminist practice and practice of feminist ethics <cite data-cite="4766306/XAZ4565X"></cite><cite data-cite="4766306/5EG6DA6S"></cite>. 

# Conclusion

What this assemblage of tweets say together is that, for Irish feminists, historical commemoration was as much a lens for critically engaging the present as it was past. Reflecting on the decade of remembrance, McGarry has suggested that more than ‘simply re-enacting the past, the most successful forms of commemoration allow for its energies to illuminate the possibility of alternative futures’<cite data-cite="4766306/HRWBICGT"></cite>. In 2016, Twitter was just one avenue through which a renewed gender historical consciousness was being expressed and transmitted, and in which Irish feminists were simultaneously challenging the past and the present. The occasion of the centenary year may have created the conditions in which tensions around inclusion or exclusion might boil over in different cultural spaces, however, this moment came on the back of decades of feminist historical scholarship, the opening up and digitisation of archives, both of which flourished in the lead up to and during the Decade of Centenaries. While this body of tweets cannot be described as representative of the public, one snapshot of the 2016 commemorations online, it is nonetheless indicative of a process that did not occur in isolation. This exercise in visibility, representation and critical commemoration via the medium of Twitter represents a wider process of actualizing the women of 1916 in ‘official’ or authorised commemoration, history, and communal memory, evidentiary of a wider public shift in the interpretation of the past, one that has continued in momentum as the second phase of the commemorative decade has progressed. 

One of the greatest difficulties of this study was to balance a rigorously ethical protocol with evidencing of sources, and creating a meaningful narrative from the data within this framework: ‘As internet technologies and practices continue to change, it has become increasingly difficult, even for experienced researchers, to know how best to achieve effective and ethical research online’ <cite data-cite="4766306/SNPUBNK9"></cite>. Given the volatile nature of this data environment, and the reactionary policies of social media companies in the face of platform abuses, we should perhaps remain cautious in our optimism around academic access to the Twitter archive. Twitter regularly moves the goalposts meaning that social media research is always vulnerable to new restrictions created by changes to the APIs, such as the ongoing move to the Twitter API v.2, to which such data collection tools do not always respond and regularly become obsolete. It is an ever-evolving challenge - technically, ethically, epistemologically, methodologically. Twitter Inc., after all, exists to profit from the instant gratification, reductive cultural politics, and lack of nuance that the platform facilitates in its users. We must therefore be cautious about what we infer from either individual tweets or quantifications within the constraints and fluctuating social norms of the medium. Universities are beginning to create better and more comprehensive guidelines, particularly in light of GDPR, for entering into this endeavour and which will facilitate researchers in making the most of their data while maintaining ethical rigour, and yet social media requires a level of source criticism, ‘digital hermeneutics,’ <cite data-cite="4766306/MFVEN6XV"></cite> and an ethics of care that is still poorly documented and articulated in digital and public history. Important, sociological, historical conversations and movements are happening in social media spaces even as they manufacture toxicity and platforms are implicated in undermining global democracy, a paradox we all live with whether we are of these platforms or not. In this article is demonstrated the feminist historical work that was being expressed through Irish Twitter in 2016. We have presented just one set of data collection and analysis methods that are possible for historical Twitter research, one that contributes directly to a growing body of literature facing the fluctuating challenges of social media research for the study of of the past in the present online.

# References

<div class="cite2c-biblio"></div>