Cannot "parse" wikimedia #20

tmcfarlane · 2016-08-31T20:12:26Z

Summary

If you load up the default input, by refreshing the page, and switch your output style to wikimedia and hit "parse" on the output, you'll get an error prompt. When you hit "OK" your input goes blank but your output remains. Switching the output style afterwards will also not affect the input nor the output.

Investigation

Current default input (with correct tabs):

Col1    Col2    Col3    Numeric Column
Value 1 Value 2 123 10.0
Separate    cols    with a tab or 4 spaces  -2,027.1
This is a row with only one cell

Current wikitable output of above input:

{| class="wikitable"

! Col1                             
! Col2    
! Col3                   
! Numeric Column 
|-

| Value 1                          
| Value 2 
| 123                    
| 10.0           
|-

| Separate                         
| cols    
| with a tab or 4 spaces 
| -2,027.1       
|-

| This is a row with only one cell 
|         
|                        
|                
|}

Prompt

The text was updated successfully, but these errors were encountered:

dwesely · 2016-09-01T01:14:19Z

It looks like there are only specific circumstances where tables can be parsed at the moment. The first line must be present that has a distinct character to indicate where the columns are, all the remaining column separators must be lined up with the header line separators.

A more generalized solution might be to compare each line to see where there are (non-alphanumeric?) characters that are the same all the way from the bottom to the top of the table to be parsed, or at least are tied for the most in a single column.

Dealing with HTML and wikimedia syntax would be a bit more, there is a javascript implementation of an html to csv parser here: https://gist.github.com/adilapapaya/9787842
I didn't see a wikimedia table parser written in javascript, but I suspect I'm just not using the right search terms.

Checks if spaces are being used as vertical separators if no other separators are found. Corrected the recursion to remove the first element from the array each iteration. Combined notifications so only one alert box is ever shown when the user uses the parse functionality.

tmcfarlane · 2016-09-17T04:55:08Z

Regarding the original issue, I don't think parsing wikimedia is actually much of a priority here. There are so many different table formats wikimedia supports and I don't really understand why. It's almost like they had an old way, then changed it, and never removed the first one. But since you are attempting to parse in javascript it can definitely get tricky as you said. Since the table definitions aren't consistent, I'd just avoid that idea all together.

I'd honestly disable the parse button when wikimedia is selected for now and just throw a message below it letting people know its not supported. This will keep the appearance of the site looking good, though I don't know how many people besides myself who have/would try this.

dwesely mentioned this issue Sep 3, 2016

Parse of tables with arbitrary vertical separators #21

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot "parse" wikimedia #20

Cannot "parse" wikimedia #20

tmcfarlane commented Aug 31, 2016 •

edited

Loading

dwesely commented Sep 1, 2016

tmcfarlane commented Sep 17, 2016 •

edited

Loading

Cannot "parse" wikimedia #20

Cannot "parse" wikimedia #20

Comments

tmcfarlane commented Aug 31, 2016 • edited Loading

Summary

Investigation

dwesely commented Sep 1, 2016

tmcfarlane commented Sep 17, 2016 • edited Loading

tmcfarlane commented Aug 31, 2016 •

edited

Loading

tmcfarlane commented Sep 17, 2016 •

edited

Loading