## Imports

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import json
from pprint import pprint

In [None]:
zipfilename = "../inputs/twitter-2022-02-17-cf8888eb631a941f287fbfec1a2662e1127775f1ba68efad59880f2fafdcfea7.zip"

## Parsing

- We are parsing the ZIP from Twitter, using a ZIPParser with a TwitterJSParser as argument.
- This parsing will return a list of Trees to explore.

In [None]:
from argonodes.parsers import ZIPParser, TwitterJSParser
zipparser = ZIPParser(parser=TwitterJSParser, regex=r"data\/.*\.js$", extension="js", verbose=1)
trees = zipparser(zipfilename)

## Tree exploration

- For each Tree, we do a bit of exploration, and try to add informations.

In [None]:
from argonodes.nodes import NA

### Installing facultative imports

We are going to use FoundRegex, which uses an external package, `tdda`.

In [None]:
from argonodes.appliers import FoundRegex
found_regex = FoundRegex()

#### Do we have some pattern?

We are going to do a "full example" using FoundRegex, but sometimes you do not need it at all.

In [None]:
from datetime import datetime

print(datetime.now())
total = len(trees)
for i, (name, tree) in enumerate(trees.items()):
    found_regex(tree)
    print(f"{datetime.now()}: {i+1}/{total}: {name}")
#     if i == 3:  # Because it can take some time...
#         break
pprint(found_regex.data)

We are going to follow the order of the `README.txt` file provied by Twitter in that case.

### "SENSORY INFORMATION"

"(Audio, electronic, visual, and similar information)"

#### `periscope-expired-broadcasts.js`

In [None]:
cur_tree = trees["data/periscope-expired-broadcasts.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

Note: No data available, but according to the README:

- broadcastIds: A list of the broadcast IDs posted by the shell account that have expired and cannot be encoded.
- reason: Explanation of why broadcast replay files are unavailable (hard-coded).

#### `spaces-metadata.js`

In [None]:
cur_tree = trees["data/spaces-metadata.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

Note: No data available, but according to the README:

- id: Unique id for the space.
- creatorUserId: The space creator’s Twitter user ID.
- hostUserIds: Twitter user IDs of users that have admin/moderator authorization of this space.
- speakers: Users that have participated in this space. It includes participants’ Twitter user IDs and start/end time of their spoken sessions. If data archive is generated at the time the space is live, it will include only the active speakers at the moment. If space has finished, then it will include everyone that participated.
- createdAt: Space creation time.
- endedAt: Space end time.
- totalParticipating: Total number of users participating in the space when the data archive is generated.
- totalParticipated: Total number of users that have participated in this space.
- invitedUserIds: Twitter user IDs of users that are chosen by the host through space conversation control.

### "IDENTIFIERS"

"(Real name, alias, postal address, telephone number, unique identifiers (such as a device identifier, cookies, mobile ad identifiers), customer number, Internet Protocol address, email address, account name, and other similar identifiers)"

#### `account-creation-ip.js`

In [None]:
cur_tree = trees["data/account-creation-ip.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

In [None]:
cur_tree.set_attributes(
    "data/account-creation-ip.js:$",
    descriptiveType=NA,
    unique=True,
    default=NA,
    description="What IP was used to create that account.",
    choices=NA,
    regex=NA,
)

In [None]:
cur_tree.set_attributes(
    "data/account-creation-ip.js:$[*].accountCreationIp.accountId",
    descriptiveType="https://schema.org/identifier",
    unique=True,
    default=NA,
    description="Unique identifier for the account.",
    choices=NA,
    regex=NA,
)

In [None]:
cur_tree.set_attributes(
    "data/account-creation-ip.js:$[*].accountCreationIp.userCreationIp",
    descriptiveType="https://github.com/hestiaAI/Argonodes/wiki/General:IPv4",
    unique=True,
    default=NA,
    description="IP address at account creation.",
    choices=NA,
    regex=[r"(\b25[0-5]|\b2[0-4][0-9]|\b[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}"],
)

#### `contact.js`

In [None]:
cur_tree = trees["data/contact.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

Note: No data available, but according to the README:

- id: Unique identifiers for the contacts imported to the account.
- emails: Emails of the contacts imported to the account.
- phoneNumbers: Phone numbers of the contacts imported to the account.

#### `email-address-change.js`

In [None]:
cur_tree = trees["data/email-address-change.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

#### `ip-audit.js`

In [None]:
cur_tree = trees["data/ip-audit.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

#### `periscope-account-information.js`

In [None]:
cur_tree = trees["data/periscope-account-information.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

#### `periscope-ban-information.js`

In [None]:
cur_tree = trees["data/periscope-ban-information.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

#### `phone-number.js`

In [None]:
cur_tree = trees["data/phone-number.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

#### `screen-name-change.js`

In [None]:
cur_tree = trees["data/screen-name-change.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

### "ONLINE ACTIVITY"

"(Internet and other electronic network activity information, including, but not limited to, information regarding interactions with websites, applications, or advertisements)"

#### `account-suspension.js`

In [None]:
cur_tree = trees["data/account-suspension.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

Note: No data available, but according to the README:

- timeStamp: Date and time of a suspension action.
- action: Action taken regarding account suspension. Accounts are unsuspended by default. This file will be empty unless the account was suspended at some point.

#### `account-timezone.js`

In [None]:
cur_tree = trees["data/account-timezone.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

#### `account.js`

In [None]:
cur_tree = trees["data/account.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

#### `ad-engagements.js`

In [None]:
cur_tree = trees["data/ad-engagements.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

#### `ad-impressions.js`

In [None]:
cur_tree = trees["data/ad-impressions.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

#### `ad-mobile-conversions-attributed.js`

In [None]:
cur_tree = trees["data/ad-mobile-conversions-attributed.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

Note: No data available, but according to the README:

- ad: Mobile application events associated with the account in the last 90 days which are attributable to a Promoted Tweet engagement on Twitter.
- attributedConversionType: Type of activity specifically associated with the event.
- mobilePlatform: Platform on which the event happened. For example: iOS or Android.
- conversionEvent: Information about the event itself such as installing or signing up.
- applicationName: Name of the application in which the event occurred.
- conversionValue: Value associated with the event.
- conversionTime: Date and time of the event.
- additionalParameters: Other optional parameters associated with the event such as a currency or product category.

#### `ad-mobile-conversions-unattributed.js`

In [None]:
cur_tree = trees["data/ad-mobile-conversions-unattributed.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

Note: No data available, but according to the README:

- ad: Mobile application events associated with the account in the last 10 days which may become attributable to a Promoted Tweet engagement on Twitter in the future.
- mobilePlatform: Platform on which the event happened. For example: iOS or Android.
- conversionEvent: Information about the event itself such as installing or signing up.
- applicationName: Name of the application in which the event occurred.
- conversionValue: Value associated with the event.
- conversionTime: Date and time of the event.
- additionalParameters: Other optional parameters associated with the event such as a currency.

#### `ad-online-conversions-attributed.js`

In [None]:
cur_tree = trees["data/ad-online-conversions-attributed.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

Note: No data available, but according to the README:

- ad: Web events associated with the account in the last 90 days which are attributable to a Promoted Tweet engagement on Twitter.
- attributedConversionType: Type of activity specifically associated with the event.
- eventType: Information about the event itself such as viewing a page.
- conversionPlatform: Platform on which the event happened. For example: desktop.
- advertiserInfo: Advertiser name and screen name.
- conversionValue: Value associated with the event.
- conversionTime: Date and time of the event.
- additionalParameters: Other optional parameters associated with the event such as a currency or product category.

#### `ad-online-conversions-unattributed.js`

In [None]:
cur_tree = trees["data/ad-online-conversions-unattributed.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

Note: No data available, but according to the README:

- ad: Web events associated with the account in the last 90 days which may become attributable to a Promoted Tweet engagement on Twitter in the future.
- eventType: Information about the event itself such as viewing a page.
- conversionPlatform: Platform on which the event happened. For example: desktop.
- conversionUrl: URL of the website on which the event occurred.
- advertiserInfo: Advertiser name and screen name.
- conversionValue: Value associated with the event.
- conversionTime: Date and time of the event.
- additionalParameters: Other optional parameters associated with the event such as a currency or product category.

#### `app.js`

In [None]:
cur_tree = trees["data/app.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

#### `birdwatch-note-rating.js`

In [None]:
cur_tree = trees["data/birdwatch-note-rating.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

#### `birdwatch-note.js`

In [None]:
cur_tree = trees["data/birdwatch-note.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

#### `block.js`

In [None]:
cur_tree = trees["data/block.js"]
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

#### `branch-links.js`

#### `community_tweet.js`

#### `connected-application.js`

#### `device-token.js`

#### `direct-message-group-headers.js`

#### `direct-message-headers.js`

#### `direct-message-mute.js`

#### `direct-messages-group.js`

#### `direct-messages.js`

#### `follower.js`

#### `following.js`

#### `like.js`

#### `lists-created.js`

#### `lists-member.js`

#### `lists-subscribed.js`

#### `moment.js`

#### `mute.js`

#### `ni-devices.js`

#### `periscope-broadcast-metadata.js`

#### `periscope-comments-made-by-user.js`

#### `periscope-followers.js`

#### `periscope-profile-description.js`

#### `professional_data.js`

#### `profile.js`

#### `protected-history.js`

#### `reply-prompt.js`

#### `saved-search.js`

#### `smartblock.js`

#### `tweet.js`

#### `tweetdeck.js`

#### `user-link-clicks.js`

#### `verified.js`

### "INFERENCES"

"(Inferences drawn to create a profile about the user reflecting their preferences, characteristics, predispositions, behavior, and attitudes)"

#### `personalization.js`

### "PROTECTED CLASSIFICATIONS"

"(Characteristics of certain legally protected classifications.)"

"For information about the language(s), gender, and age associated with the account (which may be inferred), please refer to [personalization.js](#personalization.js)."

#### `ageinfo.js`

### "LOCATION DATA"
"For location data associated with the account, please refer to location in [profile.js](#profile.js) and locationHistory in [personalization.js](#personalization.js). For information about a Periscope broadcast location, please refer to [periscope-broadcast-metadata.js](#periscope-broadcast-metadata.js)."

### data/account-creation-ip.js

### data/account-suspension.js

In [None]:
cur_tree = trees["data/account-suspension.js"]
cur_tree

#### ... Nothing in there...

### data/account-timezone.js

In [None]:
cur_tree = trees["data/account-timezone.js"]

In [None]:
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

#### Adding information where we can

In [None]:
cur_tree.set_attributes(
    "data/account-timezone.js:$",
    descriptiveType=NA,
    unique=True,
    default=NA,
    description=NA,
    choices=NA,
    regex=NA,
)

In [None]:
cur_tree.set_attributes(
    "data/account-timezone.js:$[*].accountTimezone.accountId",
    descriptiveType="https://schema.org/identifier",
    unique=True,
    default=NA,
    description="Unique account ID for that user.",
    choices=NA,
    regex=NA,
)

In [None]:
cur_tree.set_attributes(
    "data/account-timezone.js:$[*].accountTimezone.timeZone",
    descriptiveType="https://schema.org/scheduleTimezone",
    unique=False,
    default=NA,
    description="Timezone used when creating the account.",
    choices=NA,
    regex=[r"\w+"],
)

### data/account.js

In [None]:
cur_tree = trees["data/account.js"]

In [None]:
print(f"Filename: {cur_tree.filename}")
print(f"Paths:\n{cur_tree.get_paths_fancy()}")

#### Adding information where we can

In [None]:
cur_tree.set_attributes(
    "data/account.js:$",
    descriptiveType=NA,
    unique=True,
    default=NA,
    description=NA,
    choices=NA,
    regex=NA,
)

In [None]:
cur_tree.set_attributes(
    "data/account.js:$[*].account.accountDisplayName",
    descriptiveType=NA,
    unique=True,
    default=NA,
    description="Current display name for that account.",
    choices=NA,
    regex=NA,
)

In [None]:
cur_tree.set_attributes(
    "data/account.js:$[*].account.accountId",
    descriptiveType="https://schema.org/identifier",
    unique=True,
    default=NA,
    description="Unique account ID for that user.",
    choices=NA,
    regex=NA,
)

In [None]:
cur_tree.set_attributes(
    "data/account.js:$[*].account.createdAt",
    descriptiveType=NA,
    unique=True,
    default=NA,
    description="Timestamp for the creation of that account.",
    choices=NA,
    regex=NA,
)

In [None]:
cur_tree.set_attributes(
    "data/account.js:$[*].account.createdVia",
    descriptiveType=NA,
    unique=True,
    default=NA,
    description="Platform used to create that account.",
    choices=NA,
    regex=NA,
)

In [None]:
cur_tree.set_attributes(
    "data/account.js:$[*].account.email",
    descriptiveType=NA,
    unique=True,
    default=NA,
    description="Email linked to that account.",
    choices=NA,
    regex=[r"^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$"],
)

In [None]:
cur_tree.set_attributes(
    "data/account.js:$[*].account.username",
    descriptiveType=NA,
    unique=True,
    default=NA,
    description="Current username for that account.",
    choices=NA,
    regex=[r"^(\w){1,15}$"],
)

#### ... etc.

## Model

- We are now creating the Model based on the different trees.
- This Model will contains all our definitions, along with the correct paths.

In [None]:
from argonodes.models import Model
model = Model(trees=trees.values(), name="Twitter")

In [None]:
model.changes[-1]

### Exporting the Models

In [None]:
model.export_traversal(filename="../models/Twitter.md", scheme="markdown")

In [None]:
model.export_traversal(filename="../models/Twitter.json", scheme="json")

#### Preview

In [None]:
model.export_traversal(scheme="markdown")