# Basics of the Telegram API via telethon
- Searching for channels
- Collecting channel data
- Collecting messages
---
Prerequisites:
- Basics of asyncio
- Basics of time and date usage in python

# The Async Environment
- the telethon package works on the asyncio framework
- Pros:
    - we can parallelize tasks to a certain extent
    - we can use the telethon api ;)
- Cons:
    - we have to "await" results
    - it is quite complicated
- basic usage:
    - we define functions, classes, for loops etc. with the `async` keword
    - we sometimes need special asyncio versions of basic functions (e.g. `asyncio.sleep()` instead of `time.sleep()`)
- mkr in depth information on asyncio can be found [here](https://realpython.com/async-io-python/)
    - we `await` results of functions or generators

In [1]:
import asyncio

In [2]:
# example:
async def fun(x):
    print(x)
    await asyncio.sleep(1)
    print(x*2)

In [3]:
fun(1)

<coroutine object fun at 0x7fc0f967f8c0>

In [4]:
await fun(1)

1
2


it can get much more omplicated than this!

## Time and date in Python
- many APIs and packages are used with datetime objects that take into account
    - time
    - date
    - timezone
- these formats can also be used to plot data by time (in R or Python)
- two often used datetime objects are
    - `datetime.datetime`
        - understandable by humans
        - we can get subinformation easily (day, hour, etc.)
    - timestamps (seconds asince the start of the epoch on 01.01.1970 00:00:00 UTC)
        - supplied by `time.time()`
    - the formats are transferable into eachother

### How do we get python to understand our time requirements?
- We can supply time information as a string and let datetime parse it for us
    - `dt.strptime(str, format)`
- We can also transform datetime back to strings
    - `dt.strftime(format)`
- The formula has to be constructed using the a specific [format](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes)

#### a few examples:

In [46]:
from datetime import datetime as dt
import time

In [47]:
today = time.time()

In [49]:
print(type(today), today)

<class 'float'> 1670233169.329538


In [55]:
today_dt = dt.fromtimestamp(today)
print(today_dt.day, today_dt.month, today_dt.year)

5 12 2022


In [57]:
today_dt

datetime.datetime(2022, 12, 5, 10, 39, 29, 329538)

In [60]:
basic_format = '%Y-%m-%d %H:%M:%S'

In [61]:
today_str = today_dt.strftime(basic_format)
print(today_str)

2022-12-05 10:39:29


#### and the other way round:

In [64]:
target_time = dt.strptime('2022-12-05 09:00:00', basic_format)
print(target_time)

2022-12-05 09:00:00


## using telethon with asyncio

### connect to Telegram

In [5]:
from telethon import TelegramClient

In [6]:
key_path = '/Users/ungers/Documents/'

f = open(f"{key_path}tel.txt", "r")
lines = f.readlines()

access = dict()
for line in lines:
    info = line.split(' ')
    access[info[0]] = info[1][:-1]

In [7]:
api_id = access.get('id')
api_hash = access.get('hash')

In [8]:
client = TelegramClient('session', api_id, api_hash)

### searching for channels

In [9]:
from telethon.tl.functions.contacts import SearchRequest

In [17]:
candidates = [
    'querdenken münster',
    'querdenken hannover',
    'querdenken hamm',
    '@V_Zelenskiy_official'
]

In [18]:
results = []
for candidate in candidates:
    async with client: # to use the logged in client
        result = await client(SearchRequest(candidate, limit=10))
        results.append(result)

In [19]:
for r in results:
    print(r.to_dict().get('results'))

[{'_': 'PeerChannel', 'channel_id': 1401905477}]
[{'_': 'PeerChannel', 'channel_id': 1204340395}, {'_': 'PeerChannel', 'channel_id': 1444655522}, {'_': 'PeerChannel', 'channel_id': 1225384437}, {'_': 'PeerChannel', 'channel_id': 1216047451}]
[]
[{'_': 'PeerChannel', 'channel_id': 1463721328}, {'_': 'PeerChannel', 'channel_id': 1666349486}, {'_': 'PeerUser', 'user_id': 5231283513}, {'_': 'PeerChannel', 'channel_id': 1705583914}, {'_': 'PeerChannel', 'channel_id': 1491537685}, {'_': 'PeerChannel', 'channel_id': 1782691297}, {'_': 'PeerChannel', 'channel_id': 1658667793}, {'_': 'PeerChannel', 'channel_id': 1811975001}, {'_': 'PeerChannel', 'channel_id': 1792252228}, {'_': 'PeerChannel', 'channel_id': 1823209522}]


### Be careful with the results:
- Are these actually the channels that you want?
- you can check their verified status or other metadata

#### channel metadata

In [67]:
zel_search = results[3].to_dict().get('chats')

In [69]:
print(type(zel_search))

<class 'list'>


In [71]:
zel_search[0] # -> a dictionary!

{'_': 'Channel',
 'id': 1463721328,
 'title': 'Zelenskiy / Official',
 'photo': {'_': 'ChatPhoto',
  'photo_id': 5391153515738547595,
  'dc_id': 2,
  'has_video': False,
  'stripped_thumb': b'\x01\x08\x08\xb0\xd7\t\xbc6\x08\x18\xc1\x1e\xf4QE+\x0e\xe7'},
 'date': datetime.datetime(2019, 7, 30, 10, 57, 28, tzinfo=datetime.timezone.utc),
 'creator': False,
 'left': True,
 'broadcast': True,
 'verified': True,
 'megagroup': False,
 'restricted': False,
 'signatures': False,
 'min': False,
 'scam': False,
 'has_link': False,
 'has_geo': False,
 'slowmode_enabled': False,
 'call_active': False,
 'call_not_empty': False,
 'fake': False,
 'gigagroup': False,
 'noforwards': False,
 'join_to_send': False,
 'join_request': False,
 'access_hash': -6123649875255382059,
 'username': 'V_Zelenskiy_official',
 'restriction_reason': [],
 'admin_rights': None,
 'banned_rights': None,
 'default_banned_rights': None,
 'participants_count': 998837}

In [73]:
import pandas as pd

In [76]:
pd.DataFrame(zel_search)

Unnamed: 0,_,id,title,photo,date,creator,left,broadcast,verified,megagroup,...,noforwards,join_to_send,join_request,access_hash,username,restriction_reason,admin_rights,banned_rights,default_banned_rights,participants_count
0,Channel,1463721328,Zelenskiy / Official,"{'_': 'ChatPhoto', 'photo_id': 539115351573854...",2019-07-30 10:57:28+00:00,False,True,True,True,False,...,False,False,False,-6123649875255382059,V_Zelenskiy_official,[],,,,998837
1,Channel,1666349486,Zelenskiy / Official,"{'_': 'ChatPhoto', 'photo_id': 528102109245615...",2022-03-09 15:37:34+00:00,False,True,True,False,False,...,False,False,False,5499308810924775981,Zelenskyy_Volodymyr,[],,,,673
2,Channel,1705583914,https://t.me/V_Zelenskiy_official,"{'_': 'ChatPhoto', 'photo_id': 526925862014141...",2022-03-05 23:03:35+00:00,False,True,True,False,False,...,False,False,False,8134000323249237712,sluganarod,[],,,,33
3,Channel,1491537685,Zelenskiy / Official,"{'_': 'ChatPhoto', 'photo_id': 532125508336469...",2020-11-23 03:40:46+00:00,False,True,True,False,False,...,False,False,False,4801566023759222217,V_Zelenskiy_officiaI,[],,,,1
4,Channel,1782691297,Zelenskiy / Official,"{'_': 'ChatPhoto', 'photo_id': 545622876855846...",2022-04-17 15:42:51+00:00,False,True,True,False,False,...,False,False,False,555692733303232414,V_Zelensky_official,[],,,,12
5,Channel,1658667793,Zelenskiy / Gospodar,"{'_': 'ChatPhoto', 'photo_id': 535703600923107...",2022-04-04 14:37:31+00:00,False,True,True,False,False,...,False,False,False,6807713629836820087,V_Zelenskiy_officiall,[],,,,7
6,Channel,1811975001,Vладимир Zеленский,"{'_': 'ChatPhoto', 'photo_id': 519966653060492...",2022-11-28 17:01:36+00:00,False,True,True,False,False,...,False,False,False,6153153471697143018,VZ_Zelenskiy_Official,[],,,,1
7,Channel,1792252228,Zelenskiy / Official,"{'_': 'ChatPhoto', 'photo_id': 524696897682576...",2022-02-26 05:55:49+00:00,False,True,True,False,False,...,False,False,False,4279123952589928120,V_Zelenskiyy_official,[],,,,3
8,Channel,1823209522,Владимир Зеленский,"{'_': 'ChatPhoto', 'photo_id': 532133277932308...",2022-10-03 02:12:57+00:00,False,True,True,False,False,...,False,False,False,4217281352986968766,V_Zelenskiy_official1,[],,,,1


In [118]:
start_date = dt.strptime('2022-02-24 00:00:00', basic_format)

In [119]:
true_zel = results[3].chats[0]

In [120]:
async with client:
    messages = await client.get_messages(
        true_zel,
        reverse=True,
        offset_date=start_date,
        limit=500,
        #offset_id=offset
    )

In [121]:
type(messages)

telethon.helpers.TotalList

In [122]:
dicts = []
for message in messages:
    m = message.to_dict()
    dicts.append(m)

In [123]:
pd.DataFrame(dicts)

Unnamed: 0,_,id,peer_id,date,message,out,mentioned,media_unread,silent,post,...,entities,views,forwards,replies,edit_date,post_author,grouped_id,reactions,restriction_reason,ttl_period
0,Message,725,"{'_': 'PeerChannel', 'channel_id': 1463721328}",2022-02-24 04:42:52+00:00,Ми – це Україна !,False,False,False,False,True,...,[],3311528,41069,,NaT,,,,[],
1,Message,726,"{'_': 'PeerChannel', 'channel_id': 1463721328}",2022-02-24 06:48:58+00:00,Я буду щогодини повідомляти вам актуальну і до...,False,False,False,False,True,...,[],3198333,19559,,NaT,,,,[],
2,Message,727,"{'_': 'PeerChannel', 'channel_id': 1463721328}",2022-02-24 15:35:31+00:00,Не Україна обрала шлях війни. Але Україна проп...,False,False,False,False,True,...,[],2595219,9948,,NaT,,,,[],
3,Message,728,"{'_': 'PeerChannel', 'channel_id': 1463721328}",2022-02-24 21:26:30+00:00,​​Закликав лідерів ЄС – учасників надзвичайног...,False,False,False,False,True,...,"[{'_': 'MessageEntityTextUrl', 'offset': 0, 'l...",2073752,3284,,NaT,,,,[],
4,Message,729,"{'_': 'PeerChannel', 'channel_id': 1463721328}",2022-02-24 22:35:02+00:00,Сьогодні Росія атакувала всю територію України...,False,False,False,False,True,...,[],3430887,12440,,NaT,,,,[],
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,Message,1229,"{'_': 'PeerChannel', 'channel_id': 1463721328}",2022-04-12 18:08:16+00:00,Проведено спецоперацію завдяки СБУ. \nМолодці!...,False,False,False,False,True,...,[],6236642,92245,,NaT,,,,[],
496,Message,1230,"{'_': 'PeerChannel', 'channel_id': 1463721328}",2022-04-12 21:21:24+00:00,"Дуже символічно, що саме в День космонавтики б...",False,False,False,False,True,...,[],4958974,10501,,NaT,,,,[],
497,Message,1231,"{'_': 'PeerChannel', 'channel_id': 1463721328}",2022-04-12 23:40:37+00:00,It is very symbolic that Mr. Medvedchuk was de...,False,False,False,False,True,...,[],624482,352,,NaT,,,,[],
498,Message,1232,"{'_': 'PeerChannel', 'channel_id': 1463721328}",2022-04-13 10:58:59+00:00,Я закликаю Естонію при затвердженні нового сан...,False,False,False,False,True,...,[],3448526,1135,,NaT,,,,[],


In [127]:
from tqdm.notebook import tqdm

In [130]:
loop = True
first = True
offset = True

all_messages = []

while loop:
    if first:
        async with client:
            messages = await client.get_messages(
                true_zel,
                reverse=True,
                offset_date=start_date,
                limit=500,
            )

        for m in tqdm(messages):
            all_messages.append(m.to_dict())
            
        offset = messages[-1].to_dict().get('id')
        
        if len(messages) < 500:
            loop=False
        
        first = False
        
    else:
        async with client:
            messages = await client.get_messages(
                true_zel,
                reverse=True,
                offset_date=start_date,
                limit=500,
                offset_id=offset
            )

        for m in tqdm(messages):
            all_messages.append(m.to_dict())
        
        offset = messages[-1].to_dict().get('id')
        
        if len(messages) < 500:
            loop=False

  0%|          | 0/500 [00:00<?, ?it/s]

  0%|          | 0/500 [00:00<?, ?it/s]

  0%|          | 0/500 [00:00<?, ?it/s]

  0%|          | 0/500 [00:00<?, ?it/s]

  0%|          | 0/500 [00:00<?, ?it/s]

  0%|          | 0/500 [00:00<?, ?it/s]

  0%|          | 0/500 [00:00<?, ?it/s]

  0%|          | 0/38 [00:00<?, ?it/s]

In [131]:
pd.DataFrame(all_messages)

Unnamed: 0,_,id,peer_id,date,message,out,mentioned,media_unread,silent,post,...,views,forwards,replies,edit_date,post_author,grouped_id,reactions,restriction_reason,ttl_period,action
0,Message,725,"{'_': 'PeerChannel', 'channel_id': 1463721328}",2022-02-24 04:42:52+00:00,Ми – це Україна !,False,False,False,False,True,...,3311528.0,41069.0,,NaT,,,,[],,
1,Message,726,"{'_': 'PeerChannel', 'channel_id': 1463721328}",2022-02-24 06:48:58+00:00,Я буду щогодини повідомляти вам актуальну і до...,False,False,False,False,True,...,3198333.0,19559.0,,NaT,,,,[],,
2,Message,727,"{'_': 'PeerChannel', 'channel_id': 1463721328}",2022-02-24 15:35:31+00:00,Не Україна обрала шлях війни. Але Україна проп...,False,False,False,False,True,...,2595219.0,9948.0,,NaT,,,,[],,
3,Message,728,"{'_': 'PeerChannel', 'channel_id': 1463721328}",2022-02-24 21:26:30+00:00,​​Закликав лідерів ЄС – учасників надзвичайног...,False,False,False,False,True,...,2073752.0,3284.0,,NaT,,,,[],,
4,Message,729,"{'_': 'PeerChannel', 'channel_id': 1463721328}",2022-02-24 22:35:02+00:00,Сьогодні Росія атакувала всю територію України...,False,False,False,False,True,...,3430887.0,12440.0,,NaT,,,,[],,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3533,Message,4282,"{'_': 'PeerChannel', 'channel_id': 1463721328}",2022-12-04 09:38:19+00:00,,False,False,False,False,True,...,614715.0,282.0,,NaT,,1.336117e+16,,[],,
3534,Message,4283,"{'_': 'PeerChannel', 'channel_id': 1463721328}",2022-12-04 09:38:19+00:00,,False,False,False,False,True,...,615093.0,297.0,,NaT,,1.336117e+16,,[],,
3535,Message,4284,"{'_': 'PeerChannel', 'channel_id': 1463721328}",2022-12-04 09:38:19+00:00,,False,False,False,False,True,...,617179.0,278.0,,NaT,,1.336117e+16,,[],,
3536,Message,4285,"{'_': 'PeerChannel', 'channel_id': 1463721328}",2022-12-04 19:28:25+00:00,"Минає четвертий день цієї зими. Зими, яка буде...",False,False,False,False,True,...,2222678.0,1164.0,,2022-12-04 19:28:35+00:00,,,,[],,
