Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Parser for most of the Mensas from Studentenwerk Thueringen #79

Merged
merged 17 commits into from
Jul 28, 2018

Conversation

Anarchoschnitzel
Copy link
Contributor

@Anarchoschnitzel Anarchoschnitzel changed the title Added Parser for the mensas in Ilmenau Added Parser for most of the Mensas from Studentenwerk Thueringen Apr 28, 2018
@j-maas
Copy link
Collaborator

j-maas commented Apr 28, 2018

Hey, thanks for your PR!

My current implementation marks days in the past as closed.

I'm not fully sure what you mean, and I understand your code enough to verify if you did it correctly. OpenMensa keeps a history of all parsed days. I assume you mean when today's menu (say Wednesday, 2018-05-02) contains entries for the past week or the day before (2018-04-23, or 2018-05-01), then the correct behavior is to just output them as well. So the feed will simply contain all the days from the menu, and OpenMensa will update the "old" days accordingly.

I will just quickly go through your code and point out a few general things that you could improve. ;)

@Anarchoschnitzel
Copy link
Contributor Author

Hi Y0hy0h, what I mean by past days as closed is the following:
The websites I pull the information from only shows the menus from the current day on and not from days in the past. These days are currently marked as closed.
Example:
The parser is called on a Wednesday and returns alle meals from Wednesday, Thursday and Friday.
It then determines Saturday and Sunday as closed.
As no meals were parsed for Monday and Tuesday (which are in the past from the current day) these days are marked as closed.
Do you think it would be better to return no info about these day in the past at all?

Copy link
Collaborator

@j-maas j-maas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some very simple things you can do to break down your code, make it more readable. Also you make it more robust and debuggable by improving your exception handling.

closed_date = mensa_start_date + timedelta(days=i-2)
canteen.setDayClosed(closed_date.date())
except Exception:
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dangerous, because you're swallowing all exceptions. Prefer making the try block as short as possible and only excepting exactly the errors you are interested in. Or even better, replace it with a clear check (using e. g. if). That way it becomes clear what you are checking, and you do not accidentally swallow other exceptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked that part to get rid of the long and nested try block.
Thanks for your recommendations 👍

from datetime import datetime, timedelta
from pyopenmensa.feed import LazyBuilder

calendar_week_regex = re.compile('\d{4}-\d{1,2}$')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer declaring variables close to where they are used. Readability trumps performance, since you can always optimize readable code, when you've determined that the performance is important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the variables to the functions they are used at except one which is used at multiple places

for fee_candidate in fees:
fee_string = fees_regex.findall(fee_candidate.text)
for s in fee_string:
amount_strings = amount_regex.findall(s)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is deeply nested. Prefer extracting a function and name it after what happens here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted and split the parsing into 3 different functions

current_date = mensa_start_date + timedelta(days=day_id-2)
meals_data = day.find_all('tr')
for meal_data in meals_data:
meals = meal_data.find_all('td')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: Extract a function. Small functions are awesome, because you can quickly understand them, whereas such a big function is difficult to read.

Anywhere you can reasonably extract a function, do it. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted and split the parsing into 3 different functions

Changed default behaviour of non-available days in the past so that they are ignored instead of treated as closed.
Removed try-catch clause
…ed (could happen at some canteens).

Added error handling if the start date of a canteens current week could not be parsed
@Anarchoschnitzel
Copy link
Contributor Author

I took care of the changes you requested.
Thanks for your feedback 👍

@j-maas
Copy link
Collaborator

j-maas commented Apr 28, 2018

Do you think it would be better to return no info about these day in the past at all?

Yes, that's what I would do. ;)

@Anarchoschnitzel
Copy link
Contributor Author

Anarchoschnitzel commented Apr 30, 2018

Yes, that's what I would do.

This is already changed in the commits that also addressed the changes you requested.

@j-maas
Copy link
Collaborator

j-maas commented Apr 30, 2018

Sorry, I forgot to say that I'd like to wait for @klemens's (or another maintainer's) approval. I'm quite new to this repo so I have no experience merging new parsers.

@klemens hinted that he might take some time to do that in a while. I hope it's ok for you to wait until that happens.

Copy link
Collaborator

@klemens klemens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for writing this parser!

I just had a little bit of time and so this is only a first round of review. I have not yet looked at the parse_fees and parse_meals functions.

There are some problems that need to be fixed and I also left a couple of improvement suggestions. Also python doesn't mind if you put an empty line here and there and it will improve readability quite a lot. 😉

http://localhost:9090/thueringen/je-zeiss/full.xml
http://localhost:9090/thueringen/nh-mensa/full.xml
http://localhost:9090/thueringen/sk-mensa/full.xml
http://localhost:9090/thueringen/we-horn/full.xml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only testing one should be enough.

fees = parse_fees(document)
if len(fees) is 2:
employees_fee = fees[0]
guests_fee = fees[1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just define the defaults in the parse_fees function or as "constants" in the global scope. Also len(fees) is always 2 and you should check using == instead of is (see below).

def parse_start_date(document):
week_start_end_date_regex = re.compile('\d{1,2}\.\d{1,2}\.\d{4}')
calendar_week_regex = re.compile('\d{4}-\d{1,2}$')
options = document.find_all('option', value=calendar_week_regex)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The select box has name="selWeek", so you don't need to search the whole document.

calendar_week_regex = re.compile('\d{4}-\d{1,2}$')
options = document.find_all('option', value=calendar_week_regex)
assert len(options) > 0
options_filtered = list(filter(lambda x: x.has_attr('selected'), options))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BS can search for multiple attributes at once:

doc.find("option", value=calendar_week_regex, selected=True)

You could also remove the regex if you just generate the correct value for the current week directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, I am fairly new to BS and had my problems with that one ;-)

try:
mensa_start_date = parse_start_date(document)
except AssertionError:
return canteen.toXMLFeed()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this happen on weekends? I would prefer just returning None when the error is expected (like on a weekend) and otherwise just let the exceptions terminate the parser, because otherwise there won't be any notification if the website changes and the parsers fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With your feedback above, I changed the parse_start_date function so it should not throw an exception (unless something goes really wrong and as you stated in that case we want the parser to fail).
As a result the parse_date function is also a lot shorter than before :)

else:
canteen.addMeal(current_date.date(), 'Hauptgerichte', main_dish, notes, costs,
None)
for i in range(min(days_open), 9):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail if days_open is empty. Also i should be day_id like above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified this line into for day_id in range(min(days_open or [9]), 9): so if days_open is empty the loop will be skipped

for i in range(min(days_open), 9):
if i not in days_open:
closed_date = mensa_start_date + timedelta(days=i-2)
if not today or closed_date.day is datetime.today().day:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above regarding is and the date comparison.

for meal in meals:
main_dish = meal[0]
notes = meal[1]
costs = meal[2]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can destruct this in one line:

main_dish, notes, costs = meal

if guests_fee is not -1:
costs['other'] = costs['student'] + guests_fee
if today:
if current_date.day is datetime.today().day:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is operator compares object identity. While this "works" in this case due to an implementation detail, you should really just do a regular compare: == Also you should compare the date objects directly and not the day property.

None)
else:
canteen.addMeal(current_date.date(), 'Hauptgerichte', main_dish, notes, costs,
None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should check if today and current_date != datetime.today() before parsing the day and just continue if it is the case.

	- removed additional testing urls
	- reworked parse_start_date function
	- global constants for default fees
	- using != and == instead of is and is not
	- comparing date objects directly
	- handling empty days_open list
	- some general stuff he mentioned
@Anarchoschnitzel
Copy link
Contributor Author

I took care of the changes you requested.
Thanks for your feedback 👍

@Anarchoschnitzel
Copy link
Contributor Author

Hi there, I know you are pretty busy atm, but I just wanted to check if there is any progress regarding the PullRequest :)

Copy link
Collaborator

@klemens klemens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good already!

The only missing larger issue is the use of readable ingredient notes. It would also be nice to extend the feed to the following two weeks, but we can also implement this later.

The rest are just some minor improvement suggestions.

fees = parse_fees(document)

employees_fee = fees[0]
guests_fee = fees[1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

employees_fee, guests_fee = parse_fees(document)

if employees_fee != employees_fee_default:
costs['employee'] = costs['student'] + employees_fee
if guests_fee != guests_fee_default:
costs['other'] = costs['student'] + guests_fee
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use None as the missing values, then you don't need the "constants" and can just check using is not None here.

def parse_fees(document):
fees_regex = re.compile('Bedienstete.*')

fees_p = document.find_all('p', {'class': 'MsoNormal'})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use class_ instead of the dictionary.

Also, you can directly search for the text using the regex:

fees = document.find_all('p', class_='MsoNormal', string=fees_regex)

continue

canteen.addMeal(current_date.date(), 'Hauptgerichte', main_dish, notes, costs,
None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coudn't we use Mittag Ausgabe A, Mittag Ausgabe B, etc as categories instead of Hauptgerichte for everything?


for meal_data in meals_t_rows:
meal_t_datas = meal_data.find_all('td')
if len(meal_t_datas) is not 3:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use != instead of is not.


ingredients = ingredients_regex.findall(meal_t_datas[1].text)
if len(ingredients) > 0:
notes.append(ingredients[0].strip())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add a single note for every ingredient. Also please parse the table at the bottom of the page to write out the abbreviations in full. There are some helpers for this in https://github.com/mswart/pyopenmensa/blob/7b6e7a70ab63530af27b985019f28863f8b8f368/feed.py#L182-L246 that can parse ingredient tables and automatically replace the abbreviations. I can also help you out with this if you want.

students_fee = float(students_fee_string[0].replace(',', '.'))
costs = {'student': students_fee}

meals.append((main_dish, notes, costs))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just yield the tuples instead of storing them in a list and then returning the list.

canteen = LazyBuilder()

content = urlopen(url).read()
document = parse(content, 'lxml')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are only parsing the current week currently, which means that the full fead does not contain the meals for Monday on Sunday, but last Friday instead, which is not very useful.

To get the next week, you have to send a request with the selWeek=2018-25 parameter, where 25 is the calender week. It seems you can also get a list view using selView=liste which contains the explicit date, which might be easier to parse that the tabbed view.

The website uses POST-requests, but it seems GET also works: https://www.stw-thueringen.de/deutsch/mensen/einrichtungen/ilmenau/mensa-ehrenberg.html?selWeek=2018-25&selView=liste

- implemented the syntactic changes klemens requested
- categories are now named by Ausgabe A,B,C etc. according to the studentenwerk website
- ingredients are parsed according to the table on the bottom of the studentenwerk website, and appended in full as notes (one note for each ingredient / allergen) to the meals
- meals are fetched for the current week + 2 as klemens suggested via GET request (selWek=...)
@Anarchoschnitzel
Copy link
Contributor Author

Anarchoschnitzel commented Jun 20, 2018

First of all, thanks for your feedback! I haven't worked much with python so far so every bit of feedback that helps me to enhance my knowledge of the language is great 👍
I have only one small question left regarding the changes I implemented: as I am now fetching the meals for the canteens for the current week + 2 - as you suggested - it sometimes happens that the meals for 2 weeks in advance are not fully present on the Studentenwerks pages. This results in some week tabs not being there which my parser interprets as "canteen closed on that day" (which is in theory possible during the week etc. for holidays). Now my question is: if the OpenMensa server calls my parser later again and the data for the week is there so days that were previously returned as closed are returned as open and with meals, will it overwrite the existing information for "closed on that day" or ignore it?

Thanks for your help in advance :)

-btw I referenced the wrong issue in my commit message by accident, sorry for that ;)

@klemens
Copy link
Collaborator

klemens commented Jun 20, 2018

if the OpenMensa server calls my parser later again and the data for the week is there so days that were previously returned as closed are returned as open and with meals, will it overwrite the existing information for "closed on that day" or ignore it?

I am not 100% sure, but I think that newer data always overwrites older data. Alternatively, you could just return no data for those days and not bother with the "canteen closed" status at all.

I referenced the wrong issue in my commit message by accident, sorry for that ;)

Don't worry, we can clean up the commit history before merging!

- days that dont appear on a tab on the studierendenwerk webpage are now ommited instead of returned as closed
- now with correct PR link ;-)
@Anarchoschnitzel
Copy link
Contributor Author

Allrighty, days that don't appear on the websites tabs are now being omitted 👍

Copy link
Collaborator

@klemens klemens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update! We are almost there, just some minor fixes remain. I also added some improvement suggestions, as you seem to like those. 😉


def parse_url(url, today=False):
year = datetime.now().isocalendar()[0]
week = datetime.now().isocalendar()[1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

year, week, _ = datetime.now().isocalendar()


canteen = LazyBuilder()

for i in range(0, 3):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Zero is the default, so you can just use range(3).


for i in range(0, 3):

content = urlopen(url+'?selWeek='+str(year)+'-'+str(week + i)).read()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This formats the week number without a leading zero if it is less then 10. Now I have no idea what this number should look like (and I guess we have to wait until the end of the year to find out), but if I were to implement the website, I would use a leading zero. 😉 You can use a format string to format the number (and the other arguments):

"{}?selWeek={}-{:02}".format(…)

day_id = int(day['id'][-1:])

current_date = mensa_start_date + timedelta(days=day_id-2)
digits, chars = parse_ingredients(document)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same for all days, so you only need to parse it once.

option = document.find("option", value=calendar_week_regex, selected=True)

if option is None:
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you return None here? You don't check for None after calling the function and I think the parser should fail if it can't extract the date.

if chars.get(ingredient) is not None:
ingredients.append(chars[ingredient][1:])

return ingredients
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do this whole function in one line using list comprehensions if you want to:

notes = [legend[note] for note in notes if note in legend]

if len(category) == 1:
category = category[0]
else:
category = 'Hauptgerichte'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len(category) == 1 will always be true, because you just created a list with length 1 above? Did you mean to check if the string was empty?

ingredients = ingredients_regex.findall(meal_t_datas[1].text)

if len(ingredients) > 0:
ingredients_list = ingredients[0].strip('Inhalt: ').strip('\r').split(',')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use strip() instead of strip('\r'). Also you should strip the resulting list items (maybe combined with the list comprehension from above?), because there may be some spaces in there sometimes.


canteen = LazyBuilder()

for i in range(0, 3):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably only fetch the current week when the today parser is requested.

costs['employee'] = costs['student'] + employees_fee
if guests_fee is not None:
costs['other'] = costs['student'] + guests_fee
if today and current_date != datetime.today():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

./parse.py thueringen il-ehrenberg today.xml currently returns an empty result for me, because this comparison between dates is not actually between dates, but between datetimes. It should be current_date != date.today() and parse_start_date should return a date instead of a datetime (just use .date() before returning).

- implemented the proposed syntactic changes
- weeks to be fetched now determined from <select> elements options in the document so only weeks that are available from the website are being fetched. As a bonus we don't need to care about formatting with leading zeroes etc.
- variables that don't change are only being parsed once
- fess should be parsed even if more paragraphs with Bedienstete etc. exist
- reworked the ingredients parsing as proposed which ultimately led to the drop of a whole function as this could be done in a one-liner
- dates should now be correctly compared without time so the today feed should work again
@Anarchoschnitzel
Copy link
Contributor Author

Anarchoschnitzel commented Jun 21, 2018

I implemented the proposed changes.
I also found a ( I think better / more reliable) way to fetch meals for upcoming weeks, as I now parse the selection element from the document which has all available weeks for the future as options. This should also remove the necessary of bothering with leading zeroes in the ?selWeek parameter

Copy link
Collaborator

@klemens klemens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, parsing the actual values is much better!

The code looks really good now, there are just some optimizations remaining. 👍

for s in table.stripped_strings:
if groups_regex.search(s):
split = s.split(':')
groups[split[0]] = split[1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The object returned by search already contains the matched groups (the parenthesis in the regex), which can be accesses using group(1) and group(2), so you don't need split. You should also strip the description here instead of in the list comprehension in line 84.

ingredients = ingredients_regex.findall(meal_t_datas[1].text)

if len(ingredients) > 0:
ingredients_list = ingredients[0].strip('Inhalt: ').strip().split(',')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed this in the last review. strip('Inhalt: ') probably does not do what you think it does. (I suggest just using [8:] or a regex)

content = urlopen(url).read()
document = parse(content, 'lxml')

available_weeks = parse_available_weeks(document)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fetches the current week's site twice. I suggest skipping the current week and using the already available document for adding the meals. It probably makes sense to split the parse_url function into two functions, one which fetches the sites and checks which are available and the other does the actual parsing (what's currently in the for loop).

costs['employee'] = costs['student'] + employees_fee
if guests_fee is not None:
costs['other'] = costs['student'] + guests_fee
if today and current_date != date.today():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still fetches all sites, even when just the first one is needed. You need another check that only fetches the current site for today feeds (see also the comment above).



def parse_available_weeks(document):

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I find empty lines after function definitions of for loops (see below) a little bit confusing. They also seem to confuse the git diff algorithm. 😉

- implemented the proposed syntactic changes
- removed blank lines under function definitions
- using matched groups instead of string splitting for ingredients parsing
- sites should only be fetched when necessary (first page only once, following pages only if not today etc.)
- parsing of meals for canteen now separated in new function parse_meals_for_canteen so parse_url only responsible for fetching necessary information (html documents, weeks, groups, fees etc.) and returning the xml feed
@Anarchoschnitzel
Copy link
Contributor Author

I implemented the proposed changes - thanks for your feedback 👍

@Anarchoschnitzel
Copy link
Contributor Author

I just found out my solution might not work properly yet. If accessed on weekends / Sundays the Studentenwerk page does not auto select a week which currently crashes the parser on those days. I am working on a fix.

If the start date of the initial loaded page cannot be parsed (can happen on weekeends when the seleciton box only shows ---Auswahl Treffen---) the first load of the page by ?selWeek isn't skipped.
@Anarchoschnitzel
Copy link
Contributor Author

Should be fixed now

@klemens
Copy link
Collaborator

klemens commented Jul 28, 2018

Sorry for – once again – taking so long. The changes look good, I just pushed a small change to replace the assert with a normal check for Null, as exceptions should (normally) not be used for control flow.

I will now merge this (sqashed into one commit) into the master branch, which means that it will become available at http://omfeeds.devtation.de/thueringen/*location*/{full,today}.xml within a day or so. Then you can request an update or the transfer of the parser to your account, if you want. Otherwise, I can also do this.

Thanks again for writing the parser and following my (sometimes nitpicky) review! 😁

@klemens klemens merged commit 9691720 into mswart:master Jul 28, 2018
@Anarchoschnitzel
Copy link
Contributor Author

Thanks for the update!
If I find the time I will head over to the openmensa project page and request an update in the upcoming days.

Again thanks for the continuous feedback, was really fun writing this parser and learning a few new things and best practices about python 👍

@klemens
Copy link
Collaborator

klemens commented Feb 6, 2019

Soooo, I just moved all of the already existing canteens into one parser (they were separated by city) and adapted their names to your new naming scheme. After adding the new index-url of my server, they started to work immediately. 😉

I did not add the other canteens yet, mainly because I wanted to automate it (see the halle parser and my changes to it today). My plan is as follows: The city overview page (eg Jena) contains a list of canteens including their addresses. It also contains a hidden form, which contains geo-coordinates for all canteens (used for the map display). This way we should be able to generate the complete metadata info and avoid copying it by hand into openmensa (which of course would be much faster, but missing all the fun 😁).

@klemens
Copy link
Collaborator

klemens commented Feb 9, 2019

I implemented the metadata extraction in 90b95c9 and fixed a small issue, where an empty meal name would lead to an exception, because they are not allowed. Now, empty meals are just skipped: fddcca4

The extraction worked great for all but three canteens:

These three were missing in the map display, so I created them manually:

I did some quick sanity checks (the coordinates are in the right city), but it would be great if you could also take a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants