Global history contains URLs redirecting to other pages #1345

Closed
lahwaacz opened this Issue Mar 16, 2016 · 21 comments

Comments

Projects
None yet
3 participants
@lahwaacz
Collaborator

lahwaacz commented Mar 16, 2016

For example:

  1. Open https://duckduckgo.com/?q=stackoverflow

  2. Follow the first result (leads to http://stackoverflow.com/)

  3. Then the last two entries in ~/.local/share/qutebrowser/history will be:

    <timestamp> https://duckduckgo.com/?q=stackoverflow
    <timestamp> http://r.duckduckgo.com/l/?kh=-1&uddg=http%3A%2F%2Fstackoverflow.com%2F

(For reproducing, note that I have the option "Redirect -- Prevent sharing of your search with sites you click on" enabled in DuckDuckGo's settings.)

Since the global history contains only the URLs, this issue makes it impossible to find some URLs (users will usually not notice the redirect URL at all, so sometimes there is no common search term).

@The-Compiler

This comment has been minimized.

Show comment
Hide comment
@The-Compiler

The-Compiler Mar 17, 2016

Collaborator

Currently qutebrowser uses Qt's QWebHistoryInterface, so there isn't much I can do about this - but with QtWebEngine there'll probably be a selfmade history implementation instead, where I can take care of this.

Collaborator

The-Compiler commented Mar 17, 2016

Currently qutebrowser uses Qt's QWebHistoryInterface, so there isn't much I can do about this - but with QtWebEngine there'll probably be a selfmade history implementation instead, where I can take care of this.

@toofar

This comment has been minimized.

Show comment
Hide comment
@toofar

toofar Mar 18, 2016

Collaborator

This has been grinding my gears for ever too. It is totally fixable if we were to just add things to history from WebViews when they reached a certain load status or progress point. Eg

  1. get url to load
    • loadStarted
  2. connect to remote host
  3. get some body data
    • initialLayoutComplete
  4. get all the data
  5. finish loading
    • loadFinished

I am thinking go for number 3 so that we don't save redirects. Although maybe we want to? Brief testing shows that using initialLayoutComplete signal saves the final page for 301, 302, 303, 307, 308 and all of them in that order. The current method saves just the initial url in that redirect chain. When using <META http-equiv="refresh" ... both the redirecting and redirected to pages get added to history using both methods

Collaborator

toofar commented Mar 18, 2016

This has been grinding my gears for ever too. It is totally fixable if we were to just add things to history from WebViews when they reached a certain load status or progress point. Eg

  1. get url to load
    • loadStarted
  2. connect to remote host
  3. get some body data
    • initialLayoutComplete
  4. get all the data
  5. finish loading
    • loadFinished

I am thinking go for number 3 so that we don't save redirects. Although maybe we want to? Brief testing shows that using initialLayoutComplete signal saves the final page for 301, 302, 303, 307, 308 and all of them in that order. The current method saves just the initial url in that redirect chain. When using <META http-equiv="refresh" ... both the redirecting and redirected to pages get added to history using both methods

@The-Compiler

This comment has been minimized.

Show comment
Hide comment
@The-Compiler

The-Compiler Mar 18, 2016

Collaborator

I agree a custom history implementation (maybe even before QtWebEngine) is the way to go, then we finally can have titles in the completion for the history too.

I suggest looking into how Otter Browser or QupZilla solve this.

Collaborator

The-Compiler commented Mar 18, 2016

I agree a custom history implementation (maybe even before QtWebEngine) is the way to go, then we finally can have titles in the completion for the history too.

I suggest looking into how Otter Browser or QupZilla solve this.

@toofar

This comment has been minimized.

Show comment
Hide comment
@toofar

toofar Mar 18, 2016

Collaborator

Umm, I actually have some scrappy changes lying around from last time I looked at (and then abandoned) the history completion stuff to add titles to history entries, file and completion. Just changed webview to connect to page.mainFrame().initialLayoutCompleted and call addHistoryEntry() and passed title through various places and it is working if you want to have a look at that.

Collaborator

toofar commented Mar 18, 2016

Umm, I actually have some scrappy changes lying around from last time I looked at (and then abandoned) the history completion stuff to add titles to history entries, file and completion. Just changed webview to connect to page.mainFrame().initialLayoutCompleted and call addHistoryEntry() and passed title through various places and it is working if you want to have a look at that.

@toofar

This comment has been minimized.

Show comment
Hide comment
@toofar

toofar Mar 18, 2016

Collaborator
diff --git i/qutebrowser/browser/history.py w/qutebrowser/browser/history.py
index 050220d4725c..6c58be43f1de 100644
--- i/qutebrowser/browser/history.py
+++ w/qutebrowser/browser/history.py
@@ -40,18 +40,18 @@ class HistoryEntry:
         url_string: The URL which was accessed as string.
     """

-    def __init__(self, atime, url):
+    def __init__(self, atime, url, title):
         self.atime = float(atime)
         self.url = QUrl(url)
         self.url_string = url
+        self.title = title

     def __repr__(self):
         return utils.get_repr(self, constructor=True, atime=self.atime,
-                              url=self.url.toDisplayString())
+                              url=self.url.toDisplayString(), title=self.title)

     def __str__(self):
-        return '{} {}'.format(int(self.atime), self.url_string)
-
+        return '{} {} {}'.format(int(self.atime), self.url_string, self.title)

 class WebHistory(QWebHistoryInterface):

@@ -118,16 +118,20 @@ class WebHistory(QWebHistoryInterface):
         with self._lineparser.open():
             for line in self._lineparser:
                 yield
-                data = line.rstrip().split(maxsplit=1)
+                data = line.rstrip().split(maxsplit=2)
                 if not data:
                     # empty line
                     continue
-                elif len(data) != 2:
+                elif len(data) == 2:
+                    atime, url = data
+                    title = ""
+                elif len(data) == 3:
+                    atime, url, title = data
+                else:
                     # other malformed line
                     log.init.warning("Invalid history entry {!r}!".format(
                         line))
                     continue
-                atime, url = data
                 if atime.startswith('\0'):
                     log.init.warning(
                         "Removing NUL bytes from entry {!r} - see "
@@ -139,7 +143,7 @@ class WebHistory(QWebHistoryInterface):
                 # information about previous hits change the items in
                 # old_urls to be lists or change HistoryEntry to have a
                 # list of atimes.
-                entry = HistoryEntry(atime, url)
+                entry = HistoryEntry(atime, url, title)
                 self._add_entry(entry)

         self._initial_read_done = True
@@ -169,7 +173,7 @@ class WebHistory(QWebHistoryInterface):
         self._lineparser.save()
         self._saved_count = len(self._new_history)

-    def addHistoryEntry(self, url_string):
+    def addHistoryEntry(self, url_string, title=""):
         """Called by WebKit when an URL should be added to the history.

         Args:
@@ -179,7 +183,8 @@ class WebHistory(QWebHistoryInterface):
             return
         if config.get('general', 'private-browsing'):
             return
-        entry = HistoryEntry(time.time(), url_string)
+        entry = HistoryEntry(time.time(), url_string, title)
         if self._initial_read_done:
             self.add_completion_item.emit(entry)
             self._new_history.append(entry)
@@ -208,4 +216,4 @@ def init(parent=None):
     """
     history = WebHistory(parent)
     objreg.register('web-history', history)
-    QWebHistoryInterface.setDefaultInterface(history)
diff --git i/qutebrowser/browser/webview.py w/qutebrowser/browser/webview.py
index 582144444d43..a3a8a3e7ab06 100644
--- i/qutebrowser/browser/webview.py
+++ w/qutebrowser/browser/webview.py
@@ -142,9 +142,18 @@ class WebView(QWebView):
         if config.get('input', 'rocker-gestures'):
             self.setContextMenuPolicy(Qt.PreventContextMenu)
         self.urlChanged.connect(self.on_url_changed)
         self.loadProgress.connect(lambda p: setattr(self, 'progress', p))
         objreg.get('config').changed.connect(self.on_config_changed)

+    @pyqtSlot()
+    def on_initial_layout_complete(self):
+        objreg.get('web-history').addHistoryEntry(self.url().toDisplayString(), self.title())
+
     def _init_page(self):
         """Initialize the QWebPage used by this view."""
         page = webpage.BrowserPage(self.win_id, self.tab_id, self)
@@ -152,6 +161,7 @@ class WebView(QWebView):
         page.linkHovered.connect(self.linkHovered)
         page.mainFrame().loadStarted.connect(self.on_load_started)
         page.mainFrame().loadFinished.connect(self.on_load_finished)
+        page.mainFrame().initialLayoutCompleted.connect(self.on_initial_layout_complete)
         page.statusBarMessage.connect(
             lambda msg: setattr(self, 'statusbar_message', msg))
         page.networkAccessManager().sslErrors.connect(
diff --git i/qutebrowser/completion/models/urlmodel.py w/qutebrowser/completion/models/urlmodel.py
index b31ab98096cf..53f78a0cb416 100644
--- i/qutebrowser/completion/models/urlmodel.py
+++ w/qutebrowser/completion/models/urlmodel.py
@@ -99,7 +99,8 @@ class UrlCompletionModel(base.BaseCompletionModel):

     def _add_history_entry(self, entry):
         """Add a new history entry to the completion."""
-        self.new_item(self._history_cat, entry.url.toDisplayString(), "",
+        self.new_item(self._history_cat, entry.url.toDisplayString(),
+                      entry.title,
                       self._fmt_atime(entry.atime), sort=int(entry.atime),
                       userdata=entry.url)

@@ -119,14 +120,19 @@ class UrlCompletionModel(base.BaseCompletionModel):
     @pyqtSlot(object)
     def on_history_item_added(self, entry):
         """Slot called when a new history item was added."""
         for i in range(self._history_cat.rowCount()):
             url_item = self._history_cat.child(i, self.URL_COLUMN)
             atime_item = self._history_cat.child(i, self.TIME_COLUMN)
+            title_item = self._history_cat.child(i, self.TEXT_COLUMN)
             url = url_item.data(base.Role.userdata)
             if url == entry.url:
                 atime_item.setText(self._fmt_atime(entry.atime))
+                title_item.setText(entry.title)
                 url_item.setData(int(entry.atime), base.Role.sort)
                 break
         else:
             self._add_history_entry(entry)
Collaborator

toofar commented Mar 18, 2016

diff --git i/qutebrowser/browser/history.py w/qutebrowser/browser/history.py
index 050220d4725c..6c58be43f1de 100644
--- i/qutebrowser/browser/history.py
+++ w/qutebrowser/browser/history.py
@@ -40,18 +40,18 @@ class HistoryEntry:
         url_string: The URL which was accessed as string.
     """

-    def __init__(self, atime, url):
+    def __init__(self, atime, url, title):
         self.atime = float(atime)
         self.url = QUrl(url)
         self.url_string = url
+        self.title = title

     def __repr__(self):
         return utils.get_repr(self, constructor=True, atime=self.atime,
-                              url=self.url.toDisplayString())
+                              url=self.url.toDisplayString(), title=self.title)

     def __str__(self):
-        return '{} {}'.format(int(self.atime), self.url_string)
-
+        return '{} {} {}'.format(int(self.atime), self.url_string, self.title)

 class WebHistory(QWebHistoryInterface):

@@ -118,16 +118,20 @@ class WebHistory(QWebHistoryInterface):
         with self._lineparser.open():
             for line in self._lineparser:
                 yield
-                data = line.rstrip().split(maxsplit=1)
+                data = line.rstrip().split(maxsplit=2)
                 if not data:
                     # empty line
                     continue
-                elif len(data) != 2:
+                elif len(data) == 2:
+                    atime, url = data
+                    title = ""
+                elif len(data) == 3:
+                    atime, url, title = data
+                else:
                     # other malformed line
                     log.init.warning("Invalid history entry {!r}!".format(
                         line))
                     continue
-                atime, url = data
                 if atime.startswith('\0'):
                     log.init.warning(
                         "Removing NUL bytes from entry {!r} - see "
@@ -139,7 +143,7 @@ class WebHistory(QWebHistoryInterface):
                 # information about previous hits change the items in
                 # old_urls to be lists or change HistoryEntry to have a
                 # list of atimes.
-                entry = HistoryEntry(atime, url)
+                entry = HistoryEntry(atime, url, title)
                 self._add_entry(entry)

         self._initial_read_done = True
@@ -169,7 +173,7 @@ class WebHistory(QWebHistoryInterface):
         self._lineparser.save()
         self._saved_count = len(self._new_history)

-    def addHistoryEntry(self, url_string):
+    def addHistoryEntry(self, url_string, title=""):
         """Called by WebKit when an URL should be added to the history.

         Args:
@@ -179,7 +183,8 @@ class WebHistory(QWebHistoryInterface):
             return
         if config.get('general', 'private-browsing'):
             return
-        entry = HistoryEntry(time.time(), url_string)
+        entry = HistoryEntry(time.time(), url_string, title)
         if self._initial_read_done:
             self.add_completion_item.emit(entry)
             self._new_history.append(entry)
@@ -208,4 +216,4 @@ def init(parent=None):
     """
     history = WebHistory(parent)
     objreg.register('web-history', history)
-    QWebHistoryInterface.setDefaultInterface(history)
diff --git i/qutebrowser/browser/webview.py w/qutebrowser/browser/webview.py
index 582144444d43..a3a8a3e7ab06 100644
--- i/qutebrowser/browser/webview.py
+++ w/qutebrowser/browser/webview.py
@@ -142,9 +142,18 @@ class WebView(QWebView):
         if config.get('input', 'rocker-gestures'):
             self.setContextMenuPolicy(Qt.PreventContextMenu)
         self.urlChanged.connect(self.on_url_changed)
         self.loadProgress.connect(lambda p: setattr(self, 'progress', p))
         objreg.get('config').changed.connect(self.on_config_changed)

+    @pyqtSlot()
+    def on_initial_layout_complete(self):
+        objreg.get('web-history').addHistoryEntry(self.url().toDisplayString(), self.title())
+
     def _init_page(self):
         """Initialize the QWebPage used by this view."""
         page = webpage.BrowserPage(self.win_id, self.tab_id, self)
@@ -152,6 +161,7 @@ class WebView(QWebView):
         page.linkHovered.connect(self.linkHovered)
         page.mainFrame().loadStarted.connect(self.on_load_started)
         page.mainFrame().loadFinished.connect(self.on_load_finished)
+        page.mainFrame().initialLayoutCompleted.connect(self.on_initial_layout_complete)
         page.statusBarMessage.connect(
             lambda msg: setattr(self, 'statusbar_message', msg))
         page.networkAccessManager().sslErrors.connect(
diff --git i/qutebrowser/completion/models/urlmodel.py w/qutebrowser/completion/models/urlmodel.py
index b31ab98096cf..53f78a0cb416 100644
--- i/qutebrowser/completion/models/urlmodel.py
+++ w/qutebrowser/completion/models/urlmodel.py
@@ -99,7 +99,8 @@ class UrlCompletionModel(base.BaseCompletionModel):

     def _add_history_entry(self, entry):
         """Add a new history entry to the completion."""
-        self.new_item(self._history_cat, entry.url.toDisplayString(), "",
+        self.new_item(self._history_cat, entry.url.toDisplayString(),
+                      entry.title,
                       self._fmt_atime(entry.atime), sort=int(entry.atime),
                       userdata=entry.url)

@@ -119,14 +120,19 @@ class UrlCompletionModel(base.BaseCompletionModel):
     @pyqtSlot(object)
     def on_history_item_added(self, entry):
         """Slot called when a new history item was added."""
         for i in range(self._history_cat.rowCount()):
             url_item = self._history_cat.child(i, self.URL_COLUMN)
             atime_item = self._history_cat.child(i, self.TIME_COLUMN)
+            title_item = self._history_cat.child(i, self.TEXT_COLUMN)
             url = url_item.data(base.Role.userdata)
             if url == entry.url:
                 atime_item.setText(self._fmt_atime(entry.atime))
+                title_item.setText(entry.title)
                 url_item.setData(int(entry.atime), base.Role.sort)
                 break
         else:
             self._add_history_entry(entry)
@The-Compiler

This comment has been minimized.

Show comment
Hide comment
@The-Compiler

The-Compiler Mar 18, 2016

Collaborator

That looks quite good actually! I still want to look at how others did it, but it looks fine other than that - want to open PR with it, if it works from your point of view?

Collaborator

The-Compiler commented Mar 18, 2016

That looks quite good actually! I still want to look at how others did it, but it looks fine other than that - want to open PR with it, if it works from your point of view?

@toofar

This comment has been minimized.

Show comment
Hide comment
@toofar

toofar Mar 18, 2016

Collaborator

Sweet as I'll do that and polish it a bit on sunday.

Collaborator

toofar commented Mar 18, 2016

Sweet as I'll do that and polish it a bit on sunday.

@lahwaacz

This comment has been minimized.

Show comment
Hide comment
@lahwaacz

lahwaacz Jun 8, 2016

Collaborator

After #1350, the redirect target URL is recorded in history, but I'm wondering if the original URL should be recorded at all? It's basically useless for completion, but useful to QtWebkit for highlighting visited links...

Collaborator

lahwaacz commented Jun 8, 2016

After #1350, the redirect target URL is recorded in history, but I'm wondering if the original URL should be recorded at all? It's basically useless for completion, but useful to QtWebkit for highlighting visited links...

@The-Compiler

This comment has been minimized.

Show comment
Hide comment
@The-Compiler

The-Compiler Jun 8, 2016

Collaborator

I think it's useful for completion too. Say you accidentally open www.website-with-typo.com and that redirects you to www.some-shitty-spammy-page.com - would you look for the former or the latter when completing to correct your typo?

Collaborator

The-Compiler commented Jun 8, 2016

I think it's useful for completion too. Say you accidentally open www.website-with-typo.com and that redirects you to www.some-shitty-spammy-page.com - would you look for the former or the latter when completing to correct your typo?

@lahwaacz

This comment has been minimized.

Show comment
Hide comment
@lahwaacz

lahwaacz Jun 8, 2016

Collaborator

Then maybe keep it just in the per-tab local history? Considering the DuckDuckGo case above, there will be duplicate completions for every search result URL, which is much more frequent than URLs with typo redirecting to spam.

Collaborator

lahwaacz commented Jun 8, 2016

Then maybe keep it just in the per-tab local history? Considering the DuckDuckGo case above, there will be duplicate completions for every search result URL, which is much more frequent than URLs with typo redirecting to spam.

@The-Compiler

This comment has been minimized.

Show comment
Hide comment
@The-Compiler

The-Compiler Jun 8, 2016

Collaborator

Hm, that's a valid point. I'm wondering what @toofar thinks, but if he doesn't answer until the v0.7 release (which is later today or early tomorrow) I'll probably implement that.

Collaborator

The-Compiler commented Jun 8, 2016

Hm, that's a valid point. I'm wondering what @toofar thinks, but if he doesn't answer until the v0.7 release (which is later today or early tomorrow) I'll probably implement that.

The-Compiler added a commit that referenced this issue Jun 8, 2016

@The-Compiler

This comment has been minimized.

Show comment
Hide comment
@The-Compiler

The-Compiler Jun 8, 2016

Collaborator

I commited it for now, I plan to release tomorrow.

Collaborator

The-Compiler commented Jun 8, 2016

I commited it for now, I plan to release tomorrow.

@lahwaacz

This comment has been minimized.

Show comment
Hide comment
@lahwaacz

lahwaacz Jun 8, 2016

Collaborator

Now for me these two URLs are added to the history file when I open the first URL:

1465400776 http://r.duckduckgo.com/l/?kh=-1&uddg=https%3A%2F%2Fbbs.archlinux.org%2F 
1465400777 https://bbs.archlinux.org/ Arch Linux Forums

The strange thing is that this does not happen with --temp-basedir...

Also, it seems that the result will be that when the browser is restarted, the redirecting URLs will not be marked as visited, which could be sometimes confusing. I guess the only way to solve this would be adding a "redirect" field to the history file?

Collaborator

lahwaacz commented Jun 8, 2016

Now for me these two URLs are added to the history file when I open the first URL:

1465400776 http://r.duckduckgo.com/l/?kh=-1&uddg=https%3A%2F%2Fbbs.archlinux.org%2F 
1465400777 https://bbs.archlinux.org/ Arch Linux Forums

The strange thing is that this does not happen with --temp-basedir...

Also, it seems that the result will be that when the browser is restarted, the redirecting URLs will not be marked as visited, which could be sometimes confusing. I guess the only way to solve this would be adding a "redirect" field to the history file?

@The-Compiler

This comment has been minimized.

Show comment
Hide comment
@The-Compiler

The-Compiler Jun 8, 2016

Collaborator

Also, it seems that the result will be that when the browser is restarted, the redirecting URLs will not be marked as visited, which could be sometimes confusing. I guess the only way to solve this would be adding a "redirect" field to the history file?

Hm, that's true. Kind of wondering if I should revert e08c6cb for v0.7.0 (so both URLs are kept for now), and then implement something like that once I refactor the history code for QtWebEngine.

Collaborator

The-Compiler commented Jun 8, 2016

Also, it seems that the result will be that when the browser is restarted, the redirecting URLs will not be marked as visited, which could be sometimes confusing. I guess the only way to solve this would be adding a "redirect" field to the history file?

Hm, that's true. Kind of wondering if I should revert e08c6cb for v0.7.0 (so both URLs are kept for now), and then implement something like that once I refactor the history code for QtWebEngine.

@The-Compiler

This comment has been minimized.

Show comment
Hide comment
@The-Compiler

The-Compiler Jun 8, 2016

Collaborator

Now for me these two URLs are added to the history file when I open the first URL

Note the code never touches the existing contents of the file, it always only appends to it. Are you looking at historical history (hah!) by any chance? That'd explain it.

Collaborator

The-Compiler commented Jun 8, 2016

Now for me these two URLs are added to the history file when I open the first URL

Note the code never touches the existing contents of the file, it always only appends to it. Are you looking at historical history (hah!) by any chance? That'd explain it.

@lahwaacz

This comment has been minimized.

Show comment
Hide comment
@lahwaacz

lahwaacz Jun 8, 2016

Collaborator

I digged the URL in the ancient history (ddg now even uses different URLs for redirects), opened it in a new tab and it was added to the bottom of the history file. I even restarted the browser multiple times to be sure that I'm not running an older version...

Collaborator

lahwaacz commented Jun 8, 2016

I digged the URL in the ancient history (ddg now even uses different URLs for redirects), opened it in a new tab and it was added to the bottom of the history file. I even restarted the browser multiple times to be sure that I'm not running an older version...

@lahwaacz

This comment has been minimized.

Show comment
Hide comment
@lahwaacz

lahwaacz Jun 8, 2016

Collaborator

Now I've been able to reproduce it in --temp-basedir where I copied my qutebrowser.conf and :restarted. Not sure which option made the difference though...

Collaborator

lahwaacz commented Jun 8, 2016

Now I've been able to reproduce it in --temp-basedir where I copied my qutebrowser.conf and :restarted. Not sure which option made the difference though...

@The-Compiler

This comment has been minimized.

Show comment
Hide comment
@The-Compiler

The-Compiler Jun 10, 2016

Collaborator

I now pushed 66938ed which adds a -r flag to the timestamp (for maximum backwards compatibility when someone reads the history with external tools, hopefully).

URLs with a -r flag are marked as visited, but don't show up in the completion. For them to go away when they're in an existing history, you need to visit them (so they get recorded with -r) and restart.

As for the bug you mentioned, I haven't seen that so far. I'll release v0.7.0 ASAP now so I can do so before my weekend starts. If you find out more, I can always do a v0.7.1.

Collaborator

The-Compiler commented Jun 10, 2016

I now pushed 66938ed which adds a -r flag to the timestamp (for maximum backwards compatibility when someone reads the history with external tools, hopefully).

URLs with a -r flag are marked as visited, but don't show up in the completion. For them to go away when they're in an existing history, you need to visit them (so they get recorded with -r) and restart.

As for the bug you mentioned, I haven't seen that so far. I'll release v0.7.0 ASAP now so I can do so before my weekend starts. If you find out more, I can always do a v0.7.1.

@lahwaacz

This comment has been minimized.

Show comment
Hide comment
@lahwaacz

lahwaacz Jun 11, 2016

Collaborator

Actually, it seems to be problem with the detection of redirects itself. The URL above is currently not marked with -r for me, even with --temp-basedir. For example with Python requests, I get the following:

>>> r = requests.get("http://r.duckduckgo.com/l/?kh=-1&uddg=https%3A%2F%2Fbbs.archlinux.org%2F")
>>> r.status_code
200
>>> r.url
'http://r.duckduckgo.com/l/?kh=-1&uddg=https%3A%2F%2Fbbs.archlinux.org%2F'
>>> r.history
[]
>>> print(re.sub("(\<\/[a-z]*\>)", "\\1\n", r.text))
<html><head><meta name='referrer' content='origin'></head>
<body><script language='JavaScript'>window.parent.location.replace("https://bbs.archlinux.org/");</script>
<noscript><META http-equiv='refresh' content="0;URL='https://bbs.archlinux.org/'"></noscript>
</body>
</html>

So there is some JavaScript involved and due to <noscript> the URL is redirected even if JavaScript is turned off in qutebrowser. I think that qutebrowser should be able to detect both of these approaches and consider them as redirects.

Collaborator

lahwaacz commented Jun 11, 2016

Actually, it seems to be problem with the detection of redirects itself. The URL above is currently not marked with -r for me, even with --temp-basedir. For example with Python requests, I get the following:

>>> r = requests.get("http://r.duckduckgo.com/l/?kh=-1&uddg=https%3A%2F%2Fbbs.archlinux.org%2F")
>>> r.status_code
200
>>> r.url
'http://r.duckduckgo.com/l/?kh=-1&uddg=https%3A%2F%2Fbbs.archlinux.org%2F'
>>> r.history
[]
>>> print(re.sub("(\<\/[a-z]*\>)", "\\1\n", r.text))
<html><head><meta name='referrer' content='origin'></head>
<body><script language='JavaScript'>window.parent.location.replace("https://bbs.archlinux.org/");</script>
<noscript><META http-equiv='refresh' content="0;URL='https://bbs.archlinux.org/'"></noscript>
</body>
</html>

So there is some JavaScript involved and due to <noscript> the URL is redirected even if JavaScript is turned off in qutebrowser. I think that qutebrowser should be able to detect both of these approaches and consider them as redirects.

@toofar

This comment has been minimized.

Show comment
Hide comment
@toofar

toofar Jun 12, 2016

Collaborator

Hate to be that guy but if they want to redirect you they should stick to standards and use an appropriate status code. Also there is no simple way to detect whether you are being redirected from a redirect page or from some content page, except for special casing of course.

Collaborator

toofar commented Jun 12, 2016

Hate to be that guy but if they want to redirect you they should stick to standards and use an appropriate status code. Also there is no simple way to detect whether you are being redirected from a redirect page or from some content page, except for special casing of course.

@The-Compiler

This comment has been minimized.

Show comment
Hide comment
@The-Compiler

The-Compiler Jun 12, 2016

Collaborator

I opened #1574 for this now.

Collaborator

The-Compiler commented Jun 12, 2016

I opened #1574 for this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment