Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the syntax changes on the cli #7

Open
gonssal opened this issue Dec 29, 2021 · 7 comments
Open

Question about the syntax changes on the cli #7

gonssal opened this issue Dec 29, 2021 · 7 comments

Comments

@gonssal
Copy link

gonssal commented Dec 29, 2021

With the old cli, you could run ferret --cdp http://127.0.0.1:9222 script.fql and it would work without problem. What is the equivalent command with the latest cli version? I tried the following:

  • ferret exec --browser-headless --browser-address http://127.0.0.1:9222 script.fql, errs with not supported: CLICK(...)
  • ferret exec --browser-address http://127.0.0.1:9222 script.fql, errs with not supported: CLICK(...)
  • ferret exec --runtime http://127.0.0.1:9222 script.fql and ferret exec --runtime http://127.0.0.1:9222 --browser-headless script.fql, returns HTML code with title Headless remote debugging
@ziflex
Copy link
Member

ziflex commented Dec 29, 2021

If it complains that CLICK is not supported it means you are using the in-memory HTTP driver and need to switch to CDP one inside your query.

LET doc = DOCUMENT('my-page', { driver: "cdp" })

CLICK(doc, "#my-button")

RETURN TRUE

@gonssal
Copy link
Author

gonssal commented Dec 29, 2021

Yeah that did trick. I'm really sorry about wasting your time with these seemingly stupid issues, I'm finding it hard to work productively with ferret.

Things that are simple in virutally any programming language become exceedingly difficult in FQL.

@ziflex
Copy link
Member

ziflex commented Dec 30, 2021

Hey, I'm sorry to hear that you are having difficulties. What could be done to make it better?

@gonssal
Copy link
Author

gonssal commented Dec 30, 2021

I guess most of the issues are due to the declarative (functional?) design you chose for FQL and how it works.

For example, a real estate site I'm crawling has some data on each property with this HTML:

<ul class="props">
	<li>
		<div><span class="icon-wa50-sup"></span> m<sup>2</sup></div> 
		<div>215</div>
	</li>
	<li>
		<div><span class="icon-wa50-bed"></span> Rooms</div> 
		<div>4</div>
	</li>
	<li>
		<div><span class="icon-wa50-bath"></span> Bathrooms</div>
		<div>3</div>
	</li>
	<li>
		<div><span class="icon-wa50-parking"></span> Parking</div>
		<div><i class="icon-wa50-check"></i></div>
	</li>
</ul>

Not all the elements are always there on all the properties, so to know what I'm getting, I came up with this (some not-relevant code omitted):

LET property = {
	URL: propertyUrl,
	Title: TRIM(INNER_TEXT(propDoc, '.cardSlider > .body h1.titulo')),
	Reference: SUBSTITUTE(TRIM(INNER_TEXT(propDoc, '.cardSlider > .body .ref')), 'Ref.', ''),
	Description: TRIM(INNER_TEXT(propDoc, '.cardSlider > .body #descripcion_larga')),
	Price: SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(TRIM(INNER_TEXT(propDoc, '.cardSlider > .body .precio')), '€', ''), '.', ''), ',', '.'),
	Currency: 'EUR',
	AreaUnit: 'm2',
	Images: images,
	Type: 'buy'
}
LET propertyDataElements = ELEMENTS(propDoc, '.cardSlider > .body > .props:not(.props2) li')
LET propertyDataIndexes = (
	FOR propData IN propertyDataElements
		LET data = (
			ELEMENT_EXISTS(propData, 'div:first-child span.icon-wa50-sup') ? 'Area' : (
				ELEMENT_EXISTS(propData, 'div:first-child span.icon-wa50-bed') ? 'Bedrooms' : (
					ELEMENT_EXISTS(propData, 'div:first-child span.icon-wa50-bath') ? 'Baths' : (
						ELEMENT_EXISTS(propData, 'div:first-child span.icon-wa50-parking') ? 'Parking' : none
					)
				)
			)
		)
		RETURN data
)
LET propertyDataValues = (
	FOR propData IN propertyDataElements
		LET data = (
			ELEMENT_EXISTS(propData, 'div:first-child span.icon-wa50-sup') ? 'Area' : (
				ELEMENT_EXISTS(propData, 'div:first-child span.icon-wa50-bed') ? 'Bedrooms' : (
					ELEMENT_EXISTS(propData, 'div:first-child span.icon-wa50-bath') ? 'Baths' : (
						ELEMENT_EXISTS(propData, 'div:first-child span.icon-wa50-parking') ? 'Parking' : none
					)
				)
			)
		)
		RETURN (data == 'Parking' ? (ELEMENT(propData, 'div:nth-child(2) i.icon-wa50-check') ? '1' : none) : TRIM(INNER_TEXT(propData, 'div:nth-child(2)')))
)
RETURN MERGE(property, ZIP(propertyDataIndexes, propertyDataValues))

As you can see, the lack of if/else makes me nest a lot of ternary operators. Also, to add fields to the property object I have to build 2 different arrays to build the keys and values and then use ZIP(). I was expecting to be able to do something like this instead:

property[data] = (data == 'Parking' ? (ELEMENT(propData, 'div:nth-child(2) i.icon-wa50-check') ? '1' : none) : TRIM(INNER_TEXT(propData, 'div:nth-child(2)')))

This is just one recent example.

@ziflex
Copy link
Member

ziflex commented Dec 30, 2021

Well, certain limitations were done intentionally while others were a result of the source of inspiration.
Regarding the lack of if/else I do not see how it would simplify your logic, you would still have nested conditions.

And again, most of the time it's the way you solve particular problems and you just need to switch your thought process from imperative to declarative flow.
You can always switch to xpath and do something like this.

@gonssal
Copy link
Author

gonssal commented Dec 30, 2021

Well, certain limitations were done intentionally while others were a result of the source of inspiration. Regarding the lack of if/else I do not see how it would simplify your logic, you would still have nested conditions.

else if helps avoid nesting. A switch could also be used instead. In my example there's only 4 conditions, imagine the nesting if there were 15 or more.

And again, most of the time it's the way you solve particular problems and you just need to switch your thought process from imperative to declarative flow. You can always switch to xpath and do something like this.

The problem was not getting the data, but knowing what type of data it is and appending it to the already existing property object. Instead of doing property[data] = value in a single FOR, I had to create two arrays, making sure they are the same size with the keys and the values, and then ZIP and MERGE. In the github example you linked, imagine having an existing object like this:

LET stargazers = {
    "ziflex": "clock",
    "MontFerret": "organization",
    "Kremlin": "location"
}

And then in the example instead of just showing the icon type, you wanted to iteratively append to the users object with the username as property name. Something like APPEND(stargazers, {"Gusyatnikova": "organization"}) inside a FOR.

I realize it's a design issue, it's the first thing I said, but sometimes I just feel that if I could write the scripts in for example JS, I would be saving a lot time overthinking how to make things work. And please don't get me wrong, I love ferret and I think it's a great piece of software.

@ziflex
Copy link
Member

ziflex commented Dec 31, 2021

Well, certain limitations were done intentionally while others were a result of the source of inspiration. Regarding the lack of if/else I do not see how it would simplify your logic, you would still have nested conditions.

else if helps avoid nesting. A switch could also be used instead. In my example there's only 4 conditions, imagine the nesting if there were 15 or more.

And again, most of the time it's the way you solve particular problems and you just need to switch your thought process from imperative to declarative flow. You can always switch to xpath and do something like this.

The problem was not getting the data, but knowing what type of data it is and appending it to the already existing property object. Instead of doing property[data] = value in a single FOR, I had to create two arrays, making sure they are the same size with the keys and the values, and then ZIP and MERGE. In the github example you linked, imagine having an existing object like this:

LET stargazers = {
    "ziflex": "clock",
    "MontFerret": "organization",
    "Kremlin": "location"
}

And then in the example instead of just showing the icon type, you wanted to iteratively append to the users object with the username as property name. Something like APPEND(stargazers, {"Gusyatnikova": "organization"}) inside a FOR.

I realize it's a design issue, it's the first thing I said, but sometimes I just feel that if I could write the scripts in for example JS, I would be saving a lot time overthinking how to make things work. And please don't get me wrong, I love ferret and I think it's a great piece of software.

No worries, you are sharing the problems you are facing with using Ferret and that's fine!

Yes, I admit that it might be frustrating at times not being able to mutate objects in queries. I will think about how we can mitigate it in the future releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants