Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Found unique recipe schema instrucitons #230

Closed
jrvanderveen opened this issue Sep 30, 2020 · 12 comments
Closed

Found unique recipe schema instrucitons #230

jrvanderveen opened this issue Sep 30, 2020 · 12 comments

Comments

@jrvanderveen
Copy link
Contributor

Hi,
Was working on a PR for a new site https://www.feastingathome.com/tomato-risotto/ (Small but good blog site). It uses a recipe schema however it contains a interesting difference in the instructions that is not accounted for. There is a set of optional instructions. EX:
It has 5 steps that work fine and contain the "text" attribute
{"@type":"HowToStep","text":"In a large skillet, heat oil over medium-high heat. Add tomatoes (whole) and sear, stirring occasionally, until they burst and soften, about 7 minutes. Turn heat off. Chop if extra-large.","name":"BLISTER TOMATOES","url":"https://www.feastingathome.com/tomato-risotto/#instruction-step-1"},

But it contains a set of optional instructions that look like this
{"@type":"HowToSection","name":"Optional Seared Prawns:","itemListElement":[{"@type":"HowToStep","text":"If adding the prawns, mix spices and salt in a bowl. Coat shrimp with the spices. Heat 2-3 tablespoons oil in a skillet (you may need to do this in batches) over medium-high heat, sear each side 2-3 minutes or until cooked through. \u00a0Top the risotto with the seared prawns.","url":"https://www.feastingathome.com/tomato-risotto/#instruction-step-6"}]}]
It apearrs to be a nested set of optional instructions contained within the itemListElement attribute.

This causes the get instructions to fail in _schemaorg.py

    def instructions(self):
		instructions = self.data.get("recipeInstructions") or ""
		if type(instructions) == list:
			return "\n".join(
				normalize_string(instruction.get("text")) <----------- the optional steps have no "text" attribute so this returns none and fails in normalize_string().
				if type(instruction) is dict
				else normalize_string(instruction)
				for instruction in instructions
			)
		return instructions

Im down to work on this. Wanted to get your opinions on it first however.

@jksimoniii
Copy link
Contributor

jksimoniii commented Sep 30, 2020

@jrvanderveen I came across this too (different site) just the other day.

The schema for Recipe is defined here, basically saying you can expect recipeInstructions it to be a string or a list of HowToStep or HowToSection items. And to your point, the current implementation of SchemaOrg.instructions does not take into account HotToSection items.

Here is the solution that I've been testing. I don't love the implementation, because it's not the most readable. But basically, build a string by looping through recipeInstructions and checking @type in each iteration. If we get a HowToSection we need to iterate through itemListElements (I'm using map()). If we get a HowToStep than just concatenate text to our return value.

        def instructions(self):
            # TODO (@jksimoniii): Contribute this back upstream
            instructions = self.schema.data.get('recipeInstructions', '')
            ret = ''
            if type(instructions) != list:
                return instructions
            elif type(instructions) == list:
                for instruction in instructions:
                    if type(instruction) == str:
                        ret += f'{instruction}\n'
                    elif instruction.get('@type', None) == 'HowToSection':
                        ret += '\n'.join([normalize_string(instruction.get('name', '')), *list(map(
                           lambda x: normalize_string(x.get('text', None)), instruction.get('itemListElement', list())
                        )), '\n'])
                    else:
                        ret += instruction.get('text', '') + '\n'
            return ret

Edit: Looking at this more now, looks like I missed a normalize_string in the else.

@bfcarpio
Copy link
Collaborator

bfcarpio commented Oct 1, 2020

Thank you both @jrvanderveen and @jksimoniii for raising this issue! Certainly, this is something we'll want to address and will require some thought on implementation.

@jksimoniii I'm brainstorming here: since each of the schema objects obviously conform to a schema could we marshal them into python dataclasses with dacite? That way we can write functionality on the against the dataclass objects?

Pinging @hhursev for thoughts and guidance!

@hhursev
Copy link
Owner

hhursev commented Oct 3, 2020

I decided to go ahead and push this commit. Pull the latest updates and PRs are more than welcome!

Thanks for letting us know about the issue! What @jksimoniii had outlined is the exact cause for this. I've decided to go ahead and implement a "fix" for this specific case.

Which brings us to the next topic raised by @bfcarpio. Schemas are quite .. um

The schemas are a set of 'types', each associated with a set of properties. The types are arranged in a hierarchy.
The vocabulary currently consists of 841 Types, 1369 Properties, and 352 Enumeration values.

complex for parsing. Despite Schema/Recipe being a subset of the vocabulary above, I suspect it won't be easy to organize into python classes/methods hierarchy. (won't be remotely the hardest thing either though). I'd say we keep on patching for now and if our _schemaorg file becomes gruesome then we can start thinking of using or creating a separate package.

@jksimoniii
Copy link
Contributor

Thanks for addressing the issue. The last thing that I would point out is that we may want HowToSection.name included in the result. In the given example (which looks delicious by the way @jrvanderveen -- how did it come out?) I would expect the output to be something like below. Maybe "including sections" can be optional when calling for instructions(include_sections=True)

HowToStep
HowToStep
HowToStep
HowToStep
HowToStep

HowToSection

HowToStep

@hhursev
Copy link
Owner

hhursev commented Oct 3, 2020

Oh man, you got a point here.

HowToSection.name should be included (especially when I look at the url here). I plan on submitting an update now which will lead to the following result:

In a large skillet, heat oil over medium-high heat. Add tomatoes (whole) and sear, stirring occasionally, until they burst and soften, about 7 minutes. Turn heat off. Chop if extra-large.
At the same time, in a large heavy-bottomed pot or dutch oven, heat the olive oil over medium heat and add the onions. Saute until golden about 10-12 minutes. Add garlic and thyme, saute 2 more minutes until fragrant.
Add the rice, saute 1 minute, stirring. Add 2 cups warm stock (enough to cover the rice), saffron and smoked paprika, stir and bring to a simmer. Simmer until most of the liquid is absorbed. Add 1 cup broth and the tomatoes and all their juices. Stir until all the liquid is absorbed. Continue adding broth 1 cup at a time, letting the rice absorb it slowly, stirring often over med-low heat, until the rice is plumped, slightly al dente, yet creamy, about 20-25 minutes. You may not need all 8 cups. ( I used 6 3/4).
Stir in the butter and parmesan. Season generously with salt, pepper, and optional chili flakes. Taste, adjust salt. If bland, it probably needs more salt.
as a flavorful side or vegetarian main, garnishing with fresh parsley and lemon zest.
Optional Seared Prawns:
If adding the prawns, mix spices and salt in a bowl. Coat shrimp with the spices. Heat 2-3 tablespoons oil in a skillet (you may need to do this in batches) over medium-high heat, sear each side 2-3 minutes or until cooked through. Top the risotto with the seared prawns.

Can you confirm if this is what you too had in mind @jksimoniii . Thanks

@jksimoniii
Copy link
Contributor

Yeah, that looks good to me! I added some extra line breaks in mine for readability, but that's probably worth another conversation around output formatting.

@hhursev
Copy link
Owner

hhursev commented Oct 3, 2020

OR

BLISTER TOMATOES
In a large skillet, heat oil over medium-high heat. Add tomatoes (whole) and sear, stirring occasionally, until they burst and soften, about 7 minutes. Turn heat off. Chop if extra-large.
MAKE RISOTTO
At the same time, in a large heavy-bottomed pot or dutch oven, heat the olive oil over medium heat and add the onions. Saute until golden about 10-12 minutes. Add garlic and thyme, saute 2 more minutes until fragrant.
Add the rice, saute 1 minute, stirring. Add 2 cups warm stock (enough to cover the rice), saffron and smoked paprika, stir and bring to a simmer. Simmer until most of the liquid is absorbed. Add 1 cup broth and the tomatoes and all their juices. Stir until all the liquid is absorbed. Continue adding broth 1 cup at a time, letting the rice absorb it slowly, stirring often over med-low heat, until the rice is plumped, slightly al dente, yet creamy, about 20-25 minutes. You may not need all 8 cups. ( I used 6 3/4).
Stir in the butter and parmesan. Season generously with salt, pepper, and optional chili flakes. Taste, adjust salt. If bland, it probably needs more salt.
Serve
as a flavorful side or vegetarian main, garnishing with fresh parsley and lemon zest.
Optional Seared Prawns:
If adding the prawns, mix spices and salt in a bowl. Coat shrimp with the spices. Heat 2-3 tablespoons oil in a skillet (you may need to do this in batches) over medium-high heat, sear each side 2-3 minutes or until cooked through. Top the risotto with the seared prawns.

and I'll adjust the failing tests from the previously implemented schema sites. It seems the "name" may be a rather important property. Let's vote 3/5. @jksimoniii @bfcarpio @jrvanderveen @jayaddison @hhursev (😄)

Thumbs up for:

BLISTER TOMATOES
In a large skillet, heat oil over medium-high heat. Add tomatoes (whole) and sear, stirring occasionally, until they burst and soften, about 7 minutes. Turn heat off. Chop if extra-large.
MAKE RISOTTO
At the same time, in a large heavy-bottomed pot or dutch oven, heat the olive oil over medium heat and add the onions. Saute until golden about 10-12 minutes. Add garlic and thyme, saute 2 more minutes until fragrant.
Add the rice, saute 1 minute, stirring. Add 2 cups warm stock (enough to cover the rice), saffron and smoked paprika, stir and bring to a simmer. Simmer until most of the liquid is absorbed. Add 1 cup broth and the tomatoes and all their juices. Stir until all the liquid is absorbed. Continue adding broth 1 cup at a time, letting the rice absorb it slowly, stirring often over med-low heat, until the rice is plumped, slightly al dente, yet creamy, about 20-25 minutes. You may not need all 8 cups. ( I used 6 3/4).
Stir in the butter and parmesan. Season generously with salt, pepper, and optional chili flakes. Taste, adjust salt. If bland, it probably needs more salt.
Serve
as a flavorful side or vegetarian main, garnishing with fresh parsley and lemon zest.
Optional Seared Prawns:
If adding the prawns, mix spices and salt in a bowl. Coat shrimp with the spices. Heat 2-3 tablespoons oil in a skillet (you may need to do this in batches) over medium-high heat, sear each side 2-3 minutes or until cooked through. Top the risotto with the seared prawns.

Thumbs down for:

In a large skillet, heat oil over medium-high heat. Add tomatoes (whole) and sear, stirring occasionally, until they burst and soften, about 7 minutes. Turn heat off. Chop if extra-large.
At the same time, in a large heavy-bottomed pot or dutch oven, heat the olive oil over medium heat and add the onions. Saute until golden about 10-12 minutes. Add garlic and thyme, saute 2 more minutes until fragrant.
Add the rice, saute 1 minute, stirring. Add 2 cups warm stock (enough to cover the rice), saffron and smoked paprika, stir and bring to a simmer. Simmer until most of the liquid is absorbed. Add 1 cup broth and the tomatoes and all their juices. Stir until all the liquid is absorbed. Continue adding broth 1 cup at a time, letting the rice absorb it slowly, stirring often over med-low heat, until the rice is plumped, slightly al dente, yet creamy, about 20-25 minutes. You may not need all 8 cups. ( I used 6 3/4).
Stir in the butter and parmesan. Season generously with salt, pepper, and optional chili flakes. Taste, adjust salt. If bland, it probably needs more salt.
as a flavorful side or vegetarian main, garnishing with fresh parsley and lemon zest.
Optional Seared Prawns:
If adding the prawns, mix spices and salt in a bowl. Coat shrimp with the spices. Heat 2-3 tablespoons oil in a skillet (you may need to do this in batches) over medium-high heat, sear each side 2-3 minutes or until cooked through. Top the risotto with the seared prawns.

@jksimoniii
Copy link
Contributor

This is the first time I've seen a name associated with HowToStep. Names are definitely important for sections, probably less important for steps if they fall into sections. But, certainly nice to have!

@bfcarpio
Copy link
Collaborator

bfcarpio commented Oct 3, 2020

For context this HowToSection.name would be relevant for recipes with multiple large steps? Say for tacos there could be one HowToSection.name for making the tortilla and another for the filling?

EDIT: Sorry, I only saw the BLISTER TOMATOES part and not the MAKE RISOTTO. I agree we may have to tweak the string formatting at some point, but I agree. Are they normally all caps in the schema or is that something your code did?

@hhursev
Copy link
Owner

hhursev commented Oct 3, 2020

the caps is from how they've written it on the site. I was about to gather more examples for you but you already gave your vote! 😃 So - finito!

@hhursev
Copy link
Owner

hhursev commented Oct 3, 2020

Didn't mention the issue in the commit thus for references we include "name" starting from here. I'm closing the issue. Thanks everyone for the swift replies!

@hhursev hhursev closed this as completed Oct 3, 2020
@jrvanderveen
Copy link
Contributor Author

jrvanderveen commented Oct 3, 2020

Little late to the party here but I agree with everyone I think the implementation your going with could be really useful and is a more accurate pull of the instruction. Also @jksimoniii the risotto was amazing I would definitely recommend!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants