hay 3 formas de hacer headers de tablas (para mas informacion, leer docs: https://en.wikipedia.org/wiki/Help:Table)

#### Forma 1:

```
{| class="wikitable"
|+ Caption text
|-
! Header text !! Header text !! Header text
|-
| Example || Example || Example
|-
| Example || Example || Example
|-
| Example || Example || Example
|}
```

#### Forma 2:

```
{| class="wikitable"
|+ Caption text
! Header text
! Header text
! Header text
|-
| Example || Example || Example
|-
| Example || Example || Example
|-
| Example || Example || Example
|}
```

#### Forma 3:

```
{| class="wikitable"
|+ Caption text
|-
! Header text
! Header text
! Header text
|-
| Example || Example || Example
|-
| Example || Example || Example
|-
| Example || Example || Example
|}
```

#### Forma 4 ???


In [27]:
import mwparserfromhell
from IPython.display import HTML

In [28]:
x = """
{| class="wikitable plainrowheaders" style="text-align:center"
|+Manchester United's honours
! style="width:1%" |Type
! style="width:5%" |Competition
! style="width:1%" |Titles
! style="width:21%" |Seasons
|-
| rowspan="5" |Domestic
! scope="col" |First Division/Premier League<ref name="premier_league2" group="nb">Upon its formation in 1992, the Premier League became the top tier of English football; the Football League First and Second Divisions then became the second and third tiers, respectively. From 2004, the First Division became the Championship and the Second Division became League One.</ref>
| style="background-color:gold" |20
| align="left" |1907–08, 1910–11, 1951–52, 1955–56, 1956–57, 1964–65, 1966–67, 1992–93, 1993–94, 1995–96, 1996–97, 1998–99, 1999–2000, 2000–01, 2002–03, 2006–07, 2007–08, 2008–09, 2010–11, 2012–13
|-
! scope="col" |Second Division<ref name="premier_league2" group="nb" />
|2
| align="left" |1935–36, 1974–75
|-
! scope="col" |FA Cup
|13
| align="left" |1908–09, 1947–48, 1962–63, 1976–77, 1982–83, 1984–85, 1989–90, 1993–94, 1995–96, 1998–99, 2003–04, 2015–16, 2023–24
|-
! scope="col" |Football League Cup/EFL Cup
|6
| align="left" |1991–92, 2005–06, 2008–09, 2009–10, 2016–17, 2022–23
|-
! scope="col" |FA Charity Shield/FA Community Shield
| style="background-color:gold" |21
| align="left" |1908, 1911, 1952, 1956, 1957, 1965*, 1967*, 1977*, 1983, 1990*, 1993, 1994, 1996, 1997, 2003, 2007, 2008, 2010, 2011, 2013, 2016 (* shared)
|-
| rowspan="4" |Continental
! scope="col" |European Cup/UEFA Champions League
|3
| align="left" |1967–68, 1998–99, 2007–08
|-
! scope="col" |European Cup Winners' Cup
|1
| align="left" |1990–91
|-
! scope="col" |UEFA Europa League
|1
| align="left" |2016–17
|-
! scope="col" |UEFA Super Cup
|1
| align="left" |1991
|-
| rowspan="2" |Worldwide
! scope="col" |FIFA Club World Cup
|1
| align="left" |2008
|-
! scope="col" |Intercontinental Cup
|1
| align="left" |1999
|}
"""

x_2 = """
{| class="wikitable"
|+ Caption text
|-
! Header text
! Header text
! Header text
|-
| Example || Example || Example
|-
| Example || Example || Example
|-
| Example || Example || Example
|}
"""

In [29]:
def wiki_table_to_html_old(node):
    result = ['<table>']

    first_row = False
    header_loop = False

    for row in node.contents.nodes:

        # Conditions related to header loop
        # This condition indicates that the header loop will not be necessary
        if row.wiki_markup == "|-":
            first_row = True

        # This condition indicates that header loop was active but now it has finished (i.e., no more headers)
        if header_loop is True and isinstance(row, mwparserfromhell.nodes.Tag) and row.tag != 'th':
            header_loop = False
            result.append('</tr>')

        # Special caption case
        if (isinstance(row, mwparserfromhell.nodes.Tag)
            and row.tag == 'td'
            and row.contents.startswith('+')
            and row.wiki_markup == "|"
            and first_row is False
        ):
            caption_text = row.contents[1:].strip()  # Remove the '+' and strip whitespace (including \n)
            result.append(f'<caption>{caption_text}</caption>')

        # Special header case
        elif (isinstance(row, mwparserfromhell.nodes.Tag)
            and row.tag == 'th'
            and row.wiki_markup == "!"
            and first_row is False
        ):
            # If it is the first th, add <tr>, then <th>content</th>
            # In the last one, add </tr> to close the header
            if not header_loop:
                result.append('<tr>')
                header_loop = True # start the "loop"

            result.append('<th>')

            # Process the cell contents
            for content in row.contents.nodes:
                if isinstance(content, mwparserfromhell.nodes.Text):
                    result.append(str(content))

            # Close the header cell tag
            result.append('</th>')

        # Default case
        if isinstance(row, mwparserfromhell.nodes.Tag) and row.tag == 'tr':
            result.append('<tr>')
            for cell in row.contents.nodes:
                if isinstance(cell, mwparserfromhell.nodes.Tag) and cell.tag in ['td', 'th']:
                    # Extract only rowspan and colspan attributes
                    attrs = []
                    if any('rowspan' in attribute for attribute in cell.attributes):
                        for attribute in cell.attributes:
                            if 'rowspan' in attribute:
                                attrs.append(attribute.strip())
                                break
                    if any('colspan' in attribute for attribute in cell.attributes):
                        for attribute in cell.attributes:
                            if 'colspan' in attribute:
                                attrs.append(attribute.strip())
                                break

                    # Construct the opening tag with rowspan and colspan (if they exist)
                    attrs_str = ' '.join(attrs)
                    result.append(f'<{cell.tag} {attrs_str}>' if attrs_str else f'<{cell.tag}>')

                    # Process the cell contents
                    for content in cell.contents.nodes:
                        if isinstance(content, mwparserfromhell.nodes.Text):
                            result.append(str(content))

                    # Close the cell tag
                    result.append(f'</{cell.tag}>')
            result.append('</tr>')


    result.append('</table>')
    return ''.join(result)

In [None]:
def wiki_table_to_html(node):
    result = ['<table>']
    first_row = False
    header_loop = False

    for row in node.contents.nodes:

        # The header loop will not be necessary
        if row.wiki_markup == "|-":
            first_row = True  # Mark that the first row has been encountered

        # Check if the header loop is active and if the current row is not a header cell
        if header_loop is True and isinstance(row, mwparserfromhell.nodes.Tag) and row.tag != 'th':
            header_loop = False  # End the header loop
            result.append('</tr>')  # Close the header row

        # Handle captions
        if (isinstance(row, mwparserfromhell.nodes.Tag)
            and row.tag == 'td'
            and row.contents.startswith('+')
            and row.wiki_markup == "|"
            and first_row is False
        ):
            # Extract the caption text, removing the '+' and any leading/trailing whitespace
            caption_text = row.contents[1:].strip()
            result.append(f'<caption>{caption_text}</caption>')

        # Handle the special case for header cells
        elif (isinstance(row, mwparserfromhell.nodes.Tag)
            and row.tag == 'th'
            and row.wiki_markup == "!"
            and first_row is False
        ):
            # If this is the first header cell, start a new row and mark the header loop as active
            if not header_loop:
                result.append('<tr>')
                header_loop = True

            result.append('<th>')

            # Process the contents of the header cell
            for content in row.contents.nodes:
                if isinstance(content, mwparserfromhell.nodes.Text):
                    result.append(str(content))

            result.append('</th>')

        # Handle the default case for table rows
        if isinstance(row, mwparserfromhell.nodes.Tag) and row.tag == 'tr':
            result.append('<tr>')

            for cell in row.contents.nodes:
                if isinstance(cell, mwparserfromhell.nodes.Tag) and cell.tag in ['td', 'th']:
                    # Extract rowspan and colspan attributes if they exist
                    attrs = []
                    if any('rowspan' in attribute for attribute in cell.attributes):
                        for attribute in cell.attributes:
                            if 'rowspan' in attribute:
                                attrs.append(attribute.strip())
                                break
                    if any('colspan' in attribute for attribute in cell.attributes):
                        for attribute in cell.attributes:
                            if 'colspan' in attribute:
                                attrs.append(attribute.strip())
                                break

                    # Construct the opening cell tag with attributes (if any)
                    attrs_str = ' '.join(attrs)
                    result.append(f'<{cell.tag} {attrs_str}>' if attrs_str else f'<{cell.tag}>')

                    # Process the contents of the cell
                    for content in cell.contents.nodes:
                        if isinstance(content, mwparserfromhell.nodes.Text):
                            result.append(str(content))

                    # Close the cell tag
                    result.append(f'</{cell.tag}>')

            result.append('</tr>')  # Close the row

    # Close the table tag and return the result as a single string
    result.append('</table>')
    return ''.join(result)

In [31]:
wikicode = mwparserfromhell.parse(x)

html_text = wiki_table_to_html(wikicode.filter_tags(matches='table')[0])

# print(html_text)
# Render the HTML in Jupyter
HTML(html_text)

Type,Competition,Titles,Seasons
Domestic,First Division/Premier League,20,"1907–08, 1910–11, 1951–52, 1955–56, 1956–57, 1964–65, 1966–67, 1992–93, 1993–94, 1995–96, 1996–97, 1998–99, 1999–2000, 2000–01, 2002–03, 2006–07, 2007–08, 2008–09, 2010–11, 2012–13"
Domestic,Second Division,2,"1935–36, 1974–75"
Domestic,FA Cup,13,"1908–09, 1947–48, 1962–63, 1976–77, 1982–83, 1984–85, 1989–90, 1993–94, 1995–96, 1998–99, 2003–04, 2015–16, 2023–24"
Domestic,Football League Cup/EFL Cup,6,"1991–92, 2005–06, 2008–09, 2009–10, 2016–17, 2022–23"
Domestic,FA Charity Shield/FA Community Shield,21,"1908, 1911, 1952, 1956, 1957, 1965*, 1967*, 1977*, 1983, 1990*, 1993, 1994, 1996, 1997, 2003, 2007, 2008, 2010, 2011, 2013, 2016 (* shared)"
Continental,European Cup/UEFA Champions League,3,"1967–68, 1998–99, 2007–08"
Continental,European Cup Winners' Cup,1,1990–91
Continental,UEFA Europa League,1,2016–17
Continental,UEFA Super Cup,1,1991
Worldwide,FIFA Club World Cup,1,2008


# Test cases

## Test case 1 (simple)

In [None]:
wiki_table = """
{| class="wikitable"
|+ Test Case 1: Simple rowspan
|-
! Header 1
! Header 2
! Header 3
|-
| A1 || rowspan="2" | B1-B2 || C1
|-
| A2 || C2
|-
| A3 || B3 || C3
|}
"""

wikicode = mwparserfromhell.parse(wiki_table)

html_text = wiki_table_to_html(wikicode.filter_tags(matches='table')[0])

# print(html_text)

# Render the HTML in Jupyter
HTML(html_text)

## Test case 2: Simple `colspan`

In [33]:
wiki_table = """
{| class="wikitable"
|+ Test Case 2: Simple colspan
|-
! Header 1
! Header 2
! Header 3
|-
| colspan="2" | A1-B1 || C1
|-
| A2 || B2 || C2
|-
| A3 || B3 || C3
|}
"""

wikicode = mwparserfromhell.parse(wiki_table)

html_text = wiki_table_to_html(wikicode.filter_tags(matches='table')[0])

# print(html_text)

# Render the HTML in Jupyter
HTML(html_text)

Header 1,Header 2,Header 3
A1-B1,A1-B1,C1
A2,B2,C2
A3,B3,C3


## Test Case 3: Combined `rowspan` and `colspan`

In [34]:
wiki_table = """
{| class="wikitable"
|+ Test Case 3: Combined rowspan and colspan
|-
! Header 1
! Header 2
! Header 3
|-
| rowspan="2" | A1-A2 || colspan="2" | B1-C1
|-
| B2 || C2
|-
| A3 || B3 || C3
|}
"""

wikicode = mwparserfromhell.parse(wiki_table)

html_text = wiki_table_to_html(wikicode.filter_tags(matches='table')[0])

# print(html_text)

# Render the HTML in Jupyter
HTML(html_text)

Header 1,Header 2,Header 3
A1-A2,B1-C1,B1-C1
A1-A2,B2,C2
A3,B3,C3


## Test Case 4: Multiple rowspan and colspan

In [35]:
wiki_table = """
{| class="wikitable"
|+ Test Case 4: Multiple rowspan and colspan
|-
! Header 1
! Header 2
! Header 3
|-
| rowspan="2" | A1-A2 || colspan="2" | B1-C1
|-
| B2 || C2
|-
| A3 || rowspan="2" | B3-B4 || C3
|-
| A4 || C4
|}
"""

wikicode = mwparserfromhell.parse(wiki_table)

html_text = wiki_table_to_html(wikicode.filter_tags(matches='table')[0])

# print(html_text)

# Render the HTML in Jupyter
HTML(html_text)

Header 1,Header 2,Header 3
A1-A2,B1-C1,B1-C1
A1-A2,B2,C2
A3,B3-B4,C3
A4,B3-B4,C4


## Test Case 5: Complex Merging

This case is weird, but seems to be ok

In [36]:
wiki_table = """
{| class="wikitable"
|+ Test Case 5: Complex Merging
|-
! Header 1
! Header 2
! Header 3
|-
| rowspan="2" | A1-A2 || colspan="2" rowspan="2" | B1-C2
|-
|-
| A3 || B3 || C3
|-
| A4 || rowspan="2" | B4-B5 || C4
|-
| A5 || C5
|}
"""

wikicode = mwparserfromhell.parse(wiki_table)

html_text = wiki_table_to_html(wikicode.filter_tags(matches='table')[0])

# print(html_text)

# Render the HTML in Jupyter
HTML(html_text)

Header 1,Header 2,Header 3
A1-A2,B1-C2,B1-C2
A1-A2,B1-C2,B1-C2
A3,B3,C3
A4,B4-B5,C4
A5,B4-B5,C5


## Test Case 6: Edge Case - Single Cell with rowspan and colspan

In [37]:
wiki_table = """
{| class="wikitable"
|+ Test Case 6: Single Cell with rowspan and colspan
|-
! Header 1
! Header 2
! Header 3
|-
| colspan="2" rowspan="2" | A1-B2 || C1
|-
| C2
|-
| A3 || B3 || C3
|}
"""

wikicode = mwparserfromhell.parse(wiki_table)

html_text = wiki_table_to_html(wikicode.filter_tags(matches='table')[0])

# print(html_text)

# Render the HTML in Jupyter
HTML(html_text)

Header 1,Header 2,Header 3
A1-B2,A1-B2,C1
A1-B2,A1-B2,C2
A3,B3,C3


## Test Case 7: Full Table Merge

In [38]:
wiki_table = """
{| class="wikitable"
|+ Test Case 10: Full Table Merge
|-
! Header 1
! Header 2
! Header 3
|-
| colspan="3" rowspan="3" | A1-C3
|-
|-
|-
| A4 || B4 || C4
|}
"""

wikicode = mwparserfromhell.parse(wiki_table)

html_text = wiki_table_to_html(wikicode.filter_tags(matches='table')[0])

# print(html_text)

# Render the HTML in Jupyter
HTML(html_text)

Header 1,Header 2,Header 3
A1-C3,A1-C3,A1-C3
A1-C3,A1-C3,A1-C3
A1-C3,A1-C3,A1-C3
A4,B4,C4


# Nested Tables

`mwparserfromhell` does not correctly identify nested tables, which creates a significant issue. Specifically, it treats the start of a table (`{|`) as a "normal node" rather than recognizing it as the beginning of a nested structure.

This behavior disrupts the recursive approach entirely, as the parser fails to distinguish between outer and inner tables. To address this, we would need to develop a solution from scratch that properly handles nested tables.

```
{|
|+ Outer Table
|-
| Outer Cell 1
| {|
  |+ Inner Table
  |-
  | Inner Cell 1 || Inner Cell 2
  |}
| Outer Cell 2
|}
```