# Redirects demo
*Note that the redirects architecture is currently being rewritten to run more smoothly with other functions.*

We demo 4 different functions for collecting redirects data from Wikipedia:
1. `basic_info`: Collects the basic information of a Wikipedia page, including the title, page ID, and namespace, with further options.
2. `fix_redirects`: Fixes redirects for any incorrect / old article titles.
3. `get_redirects`: Collects all redirects for given Wikipedia articles.

## Setup

In [1]:
import mwapi
import wikitools
import pandas as pd

my_agent = 'mwapi testing <p.gildersleve@lse.ac.uk>'
async_session = mwapi.AsyncSession('https://en.wikipedia.org',
                    formatversion=2, user_agent=my_agent)
toparts = pd.read_csv('data/topviews-2024_07_31.csv')
artlist = toparts['Page'].unique().tolist() # ~1000 top articles yesterday

## `basic_info`

In [2]:
basic_info = await wikitools.basic_info(async_session, artlist)
pd.DataFrame(basic_info[0]).head()

Unnamed: 0,pageid,ns,title
0,3356,0,Bill Clinton
1,9232,0,Eiffel Tower
2,12551,0,Gymnastics
3,15992,0,Jimmy Carter
4,21148,0,Netherlands


## `fix_redirects`

Fixes redirects for any incorrect / old article titles. Stores results in dictionaries that it edits in place.

In [3]:
rd_arts = ["kamala harris", "joe biden", "uk"] # random articles
redirect_map = {}
norm_map = {}
id_map = {}
await wikitools.fix_redirects(async_session, titles=rd_arts, redirect_map=redirect_map,
                              norm_map=norm_map, id_map=id_map)

print(norm_map)
print(redirect_map)
print(id_map)


{'kamala harris': 'Kamala harris', 'uk': 'Uk', 'joe biden': 'Joe biden'}
{'Uk': 'United Kingdom', 'Joe biden': 'Joe Biden', 'Kamala harris': 'Kamala Harris'}
{'United Kingdom': 31717, 'Joe Biden': 145422, 'Kamala Harris': 3120522}


## `get_redirects`

This collects all the redirect pages for given page titles. Again, this function edits the supplied dictionary in place. It should not be used with the `fix_redirects` function first. 

In [4]:
collected_redirects = {}
await wikitools.get_redirects(async_session, rd_arts, redirect_map=redirect_map,
                              norm_map=norm_map, id_map=id_map,
                              collected_redirects=collected_redirects)
collected_redirects

{'United Kingdom': ['United Kingdom',
  'United Kindom',
  'U.K.',
  'ISO 3166-1:GB',
  'U.K',
  'United Kingom',
  'Uk',
  'Great Britain and Northern Ireland',
  'The UK',
  'UK',
  'The United Kingdom',
  "UK's",
  'United Kingdom of Great Britain and Northern Island',
  "United Kingdom's",
  'UnitedKingdom',
  'United kingdom of great britain and northern ireland',
  'United Kingsom',
  'British state',
  'TUKOGBANI',
  'United Kingdom of Great Britain and Northern Ireland',
  'The United Kingdom of Great Britain and Northern Ireland',
  'United Kingdom of Great Britain & Northern Ireland',
  'United kingdom',
  'United Kindgom',
  'Great britain and northern ireland',
  'UKGBNI',
  'U.K.G.B.N.I.',
  'The uk',
  'Royaume-Uni',
  'UKOGBANI',
  'United Kingdom of Great Britain and Ulster',
  'Great Britain and Ulster',
  'Great Britain & Ulster',
  'United Kingdom of Great Britain & Ulster',
  'The United Kingdom of Great Britain & Ulster',
  'United kingom',
  'Reino Unido',
  'Regn

In [5]:
# also works with pageids
collected_redirects = {}
ids = [736, 9332, 60815369]
await wikitools.get_redirects(async_session, pageids=ids, redirect_map=redirect_map,
                              norm_map=norm_map, id_map=id_map,
                              collected_redirects=collected_redirects)
collected_redirects

{'Albert Einstein': ['Albert Einstein',
  'Einstein',
  'Albert Eienstein',
  'Albert Einstien',
  'Albert einstein',
  'Einstien',
  'Einsteinian',
  'Einsetein',
  'Albert Enstein',
  "Albert Einstein's",
  'Einstein, Albert',
  'Albert Enstien',
  'Alber Enstien',
  'Albert Einstin',
  'A. Einstein',
  'Alber Einstein',
  'Einstein (physicist)',
  'Albrecht Einstein',
  'Albert eintein',
  'Chasing a light beam',
  'I want to go when I want. It is tasteless to prolong life artificially. I have done my share, it is time to go. I will do it elegantly.',
  'Dr. Albert Einstein',
  'Dr Albert Einstein',
  'Dr. Einstein',
  'Dr Einstein'],
 'Errol Morris': ['Errol Morris',
  'Errol morris',
  'Erol Morris',
  'Erroll Morris',
  'Errol Mark Morris'],
 'Daisy Edgar-Jones': ['Daisy Edgar-Jones', 'Draft:Daisy Edgar-Jones']}