Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot "parse" wikimedia #20

Open
tmcfarlane opened this issue Aug 31, 2016 · 2 comments
Open

Cannot "parse" wikimedia #20

tmcfarlane opened this issue Aug 31, 2016 · 2 comments

Comments

@tmcfarlane
Copy link

tmcfarlane commented Aug 31, 2016

Summary

If you load up the default input, by refreshing the page, and switch your output style to wikimedia and hit "parse" on the output, you'll get an error prompt. When you hit "OK" your input goes blank but your output remains. Switching the output style afterwards will also not affect the input nor the output.

Investigation

Current default input (with correct tabs):

Col1    Col2    Col3    Numeric Column
Value 1 Value 2 123 10.0
Separate    cols    with a tab or 4 spaces  -2,027.1
This is a row with only one cell

Current wikitable output of above input:

{| class="wikitable"

! Col1                             
! Col2    
! Col3                   
! Numeric Column 
|-

| Value 1                          
| Value 2 
| 123                    
| 10.0           
|-

| Separate                         
| cols    
| with a tab or 4 spaces 
| -2,027.1       
|-

| This is a row with only one cell 
|         
|                        
|                
|}

Prompt
ascii-tables-error-prompt

@dwesely
Copy link
Contributor

dwesely commented Sep 1, 2016

It looks like there are only specific circumstances where tables can be parsed at the moment. The first line must be present that has a distinct character to indicate where the columns are, all the remaining column separators must be lined up with the header line separators.

A more generalized solution might be to compare each line to see where there are (non-alphanumeric?) characters that are the same all the way from the bottom to the top of the table to be parsed, or at least are tied for the most in a single column.

Dealing with HTML and wikimedia syntax would be a bit more, there is a javascript implementation of an html to csv parser here: https://gist.github.com/adilapapaya/9787842
I didn't see a wikimedia table parser written in javascript, but I suspect I'm just not using the right search terms.

dwesely referenced this issue Sep 4, 2016
Checks if spaces are being used as vertical separators if no other separators are found.
Corrected the recursion to remove the first element from the array each iteration.
Combined notifications so only one alert box is ever shown when the user uses the parse functionality.
@tmcfarlane
Copy link
Author

tmcfarlane commented Sep 17, 2016

Regarding the original issue, I don't think parsing wikimedia is actually much of a priority here. There are so many different table formats wikimedia supports and I don't really understand why. It's almost like they had an old way, then changed it, and never removed the first one. But since you are attempting to parse in javascript it can definitely get tricky as you said. Since the table definitions aren't consistent, I'd just avoid that idea all together.

I'd honestly disable the parse button when wikimedia is selected for now and just throw a message below it letting people know its not supported. This will keep the appearance of the site looking good, though I don't know how many people besides myself who have/would try this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants