-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design and implement the architecture of our analyzer #12
Comments
One way to approach this is to display a website within a website using the HTML inline Frame element (<iframe>). I ran into some errors Another approach would be to stream a developer’s website from our server to their browser, and then they could remotely interact with it. For instance, using the chrome.tabCapture, though I would imagine this would be laggy and I’m not sure if this would be a practical.
I think this is the most promising approach, but there are a lot more moving pieces, such as selenium requiring a chrome installation. Do we pack chrome with the sdk or have the developer link to it on the computer. How simple can we make the experience for the developer? I’ll need more time to think about this. Also, @SebastianZimmeck mentioned in the architecture, ‘The developer logs into the developer dashboard of the web app privacy analysis app and starts the analysis’, are we expecting to handle user authentication and remember a developer's websites analysis results? |
Yes, I am imagining something like a dashboard on Google Analytics (to say one example).
I agree. Using iframes and streaming the website sound that a lot of things could go wrong. Maybe, we can package Selenium and our other tools into a relatively simple to install app that the developers can locally (possibly with some SDK in the developer's app if it helps us). Then send the information to our server with the dashboard. So there would be a clear separation between analysis (locally) and results (online). The only other solution I can currently think of is to host a virtual machine on our server where the developer is running their web app. This should be, in principle, similar to the local solution, except that the machine is run not locally but on our server. |
On the VM idea, I started an f1-micro instance (1 vCPU, 0.6 GB memory, Ubuntu/Linux) on the Google Cloud Platform using Compute Engin. It should be up and running and both of you, @davebaraka and @rgoldstein01 should have received an email with details on how to access it. @davebaraka, it would be good if you can look into whether that works for our purposes. Feel free to make any changes to the setup of the VM (or otherwise).
For me as a reminder: This instance should be free for the time being:
Here is an explanation on Stackoverflow on the pricing. |
Considering the VM instance, where we have a Linux distribution available to us, there still does not seem to be a seamless way to run a developer’s web app on the browser of our web app. I looked into SSH X11 Forwarding, which allows you to start up remote applications and forward the application display to your local machine, but there is no way to get this to work in the context of our web app and the developer’s browser. To get something where a developer could access and navigate their website within our web app, Electron would be our best bet, as we would be basically building our own browser. Though this would probably require a re-exploration into intercepting and decrypting HTTP requests and a change in tools. With our current set of tools |
As discussed on Tuesday, @davebaraka will explore both the modular as well as the Electron-based architecture. We can see what makes the most sense. |
I'm exploring different architectures and in order to use selenium, it seems that we would have to package a version of chrome, firefox, or another supported browser with selenium, or have the developer link to a local copy on their machine when distributing it. There could be compatibility issues with the selenium webdriver and the browser version if a developer linked to a local copy. In other words, I feel like we may run into complications packaging and creating an easy experience for a developer. As an alternative, and since we are not crawling a site and currently only looking at the 'POST/PUT' data, I created an architecture that revolves around a browser extension. The extension is able to intercept and read the network requests and it can communicate with a web app. Additionally, I can inject javascript to webpages where I imagine that we guide a user and give status updates as they navigate their site. We also have a popup dialog available from the extension to provide updates. I've provided a demo of the prototype here. I think I can do pretty much the same thing with selenium (without a browser extension), but I'm a bit unsure about how we would distribute this to developers. With a chrome extension, a developer would install the extension and we would move on from there. |
Nice. It looks really good. One drawback may be that with a browser extension we would be limited to intercepting the HTTP requests between the web app and its user. We could not intercept any requests in between a backend API and the app (issue #16). Generally, we would be limited by the browser sandbox. So, we would need to combine the extension with other tools. Any thoughts on Electron? I found some tutorial on building a web browser with Electron and JS. It seems that Electron is using webkit, though, that route would be more complicated. Also, whatever browser solution we pick, there is probably not much we can do in terms of intercepting HTTP requests at the backend. |
Using a browser extension does not meet our requirements for our analysis. I will explore an alternative set up using electron (maybe use selenium with electron) and another set up using a proxy on a server. mitmproxy seems like a promising https proxy. At the moment, somehow packaging selenium and injecting javascript with it that communicates a developer's website with our web app seems like the most promising route. |
@davebaraka had the idea of looking into a reverse proxy to intercept the traffic between, say, an app's database and an third party data broker. Also, it seems that we are leaning towards Selenium as opposed to Electron. |
Although electron does provide a viable solution, as we can intercept requests, read post and body data, and we are less restricted in providing a local user experience, we decided to use selenium and inject javascript to provide an experience to guide a developer. I will look into javascript injection, since I've have yet to test out this functionality in selenium. Thinking way ahead to when we package/distribute our tool:
|
add browser extension to inject javascript #12
As we switched from a developer to a user tool, there is probably not a whole lot to say beyond that we will be creating a browser extension. However, I am leaving this issue open for now in case something is coming up. |
I've scaffolded the browser extension. @rgoldstein01 @notowen333 To get started:
Unfortunately, chrome doesn't yet have a WebRequest api to get response body data. There have been talks about adding this feature to chrome since 2015 here. As a workaround, I've tried a method described here to get response data, but I was not getting the expected results (if anyone else wants to give it a try). Though, it seems like Firefox has an api with this functionality. In other words, Firefox potentially has everything we need. If we decide to use Firefox, we wouldn't be able to fully port the extension to chrome or safari until they support an api for handling network response data. Post data is available for both chrome and Firefox. |
That is excellent, @davebaraka! On the Chrome/Firefox question, I am OK with using Firefox as long as we can design this extension in a modular way with the Firefox logic clearly separated from the analysis logic, label creation logic, etc. so that we potentially can use our work in a non-Firefox context. |
Yes, let's make sure we know that it does not work before going the Firefox route. |
I have followed both the tutorial that David linked and another one, and it appears that Chrome has deprecated most of the functionality around webrequests, as the documentation for reading the bodies is not on their website anymore. I suggest we try out the Firefox version, instead. @davebaraka I don't mind creating that and giving it a try, if that helps. I can create the base app later today. |
So I have created the Firefox extension, and it is able to do a few basic tasks, even print the headers, etc. However, I am still getting undefined when I try and look into the request bodies. There are a few discussions about it: But I don't think any of it was ever resolved. Not sure where this leaves us, as we need to be able to see the bodies for more of the relevant information. |
I have found another issue on it https://bugzilla.mozilla.org/show_bug.cgi?id=1416486 |
I've added the Firefox extension to the repo under the To install an extension temporarily:
To debug an extension:
If you make any changes to the extension, make sure to reload the extension, by clicking Here are detailed docs about installation and debugging Let me know if you experience any problems. |
For the time being, we decided to keep it simple with the extension working locally in the user's browser. So, maybe using the Web Storage API to store analysis results upon a user visiting a site and then showing them to the user in a popup or local html page of the extension. |
I will start building the UI in issue 61 |
As we have now settled on Django/Python, Selenium + BrowserMob, and Heroku, the question becomes how to design and implement our analyzer. I have updated the architectural overview.
Essentially, we could design our analyzer as a web app that the developer logs in to and runs on the browser of our web app their web app, which is then being analyzed by us. So, everything could work remotely. The developer does not need to install anything locally on their end. As @davebaraka mentioned, maybe it is more clear to actually run two servers: one where the developer is running their web app and one with our analysis logic.
An alternative setup could be that only our analysis module runs on the server and the developer is running their app locally. Their app would then communicate with our server on the current running analysis, results to display, etc. Maybe, in this architecture we ask the developer to integrate into their web app an SDK that we provide that communicates with our sever and drives the analysis.
The bottom line is to come up with an architecture that does not require the developer to do a whole lot of set up steps on their end and also make our analysis work on different platforms (e.g., independently of whether the developer is using Windows or macOS locally).
The text was updated successfully, but these errors were encountered: