Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtual Robots.txt module #1647

Closed
wpsmort opened this issue Apr 7, 2018 · 84 comments

Comments

@wpsmort
Copy link
Member

commented Apr 7, 2018

This is the master issue for the new virtual Robots.txt module. The following tasks need to be completed:

  • Robots.txt should filter built-in WordPress virtual robots.txt (old issue #214)
  • Remove all code related to handling a physical robots.txt file
  • Remove Robots.txt tab from File Manager module (old issue #213)
  • Remove the Optimize your Robots.txt File functionality from current module
  • Remove Save Robots.txt File button and Delete Robots.txt File button from current module
  • Store all robots.txt rules in database
  • Sanitize entries before writing to the database
  • Sanitize paths to add trailing slash when none is present
  • Check for presence of a physical robots.txt file and display a warning with prompts to Import and delete existing file, delete existing file, do nothing (old issue #596)
  • Import existing physical robots.txt file when user visits the Robots.txt module
  • Sanitize the import of existing physical robots.txt file during import
  • Support WordPress multisite with sub domain installs (old issue #1143)
  • Support WordPress multisite with domain mapping (old issue #1124)
  • Network Admins should be able to set default rules (old issue #1600)
  • Check for duplicates or conflicts for rules added by Site Admin to rules added by Network Admin
  • Allow wildcards in rules and check for duplicates or conflicts

@wpsmort wpsmort added the Enhancement label Apr 7, 2018

@wpsmort wpsmort added this to the 2.6 milestone Apr 7, 2018

@wpsmort

This comment has been minimized.

Copy link
Member Author

commented Apr 7, 2018

Google has a developer doc for robots.txt that is a must read regarding order of precedence - https://developers.google.com/search/reference/robots_txt

In testing I found that when there is a Disallow directive and an Allow directive in the same group, the Allow directive always overrides the Disallow regardless of its position within the group (for example, the Allow can be before or after the Disallow, it will always match). This is not mentioned in the developer doc.

Example of trailing slash precedence in robots.txt:

User-agent: Googlebot-Image
Disallow: /wp-content/image-test - This Network Admin rule is intended to take precedence but it's missing the trailing slash

User-agent: Googlebot-Image
Allow: /wp-content/image-test/ - This Site Admin rule takes precedence for the directory because it has the trailing slash

Desired outcome above is that we enforce strict paths so that the Network Admin rule has the correct trailing slash and takes precedence over the Site Admin rule.


Example of rule matching:*

User-agent: Googlebot-Image
Disallow: /wp-content/image-test/ - This is Network Admin rule is intended to take precedence

User-agent: Googlebot-Image
Allow: /wp-content/image-test/ - This Site Admin rule duplicates and overrides the Network Admin rule

Desired outcome above is that a Site Admin cannot duplicate an existing rule thereby overriding a Network Admin rule.


Example of wildcard precedence in robots.txt:

User-agent: Googlebot
Allow: /wp-content/image-test/ - This Network Admin rule is intended to take precedence and block ALL Googlebots from that directory but it's overridden by the wildcard rule below

User-agent: Googlebot-Image
Disallow: /wp-* - This Site Admin wildcard rule will take precedence because it's specific for Googlebot-Image and has a wildcard

Desired outcome above is that a Site Admin cannot override by specifying a different User-agent and using wildcards.

@wpsmort wpsmort added this to Phase 1 - Initial change to virtual file in Robots.txt module - Change from physical file to virtual file Apr 7, 2018

@wpsmort

This comment has been minimized.

Copy link
Member Author

commented Apr 7, 2018

Reported here - https://wordpress.org/support/topic/robots-txt-not-working-on-multisite/#post-9469713

Currently we create a static file for the robot.txt which means that there's one file for a multisite network. If domain mapping is active on the multisite then this causes problems because the sitemap URL will be wrong for all but the primary site.

@contactashish13

This comment has been minimized.

Copy link
Contributor

commented Apr 10, 2018

if we are not showing the editable text area, no one can delete rules once added. Suggestions?

@michaeltorbert

This comment has been minimized.

Copy link
Member

commented Apr 10, 2018

Yes, that's a UI problem that will need to be solved. Perhaps an x next to the rule? Any other thoughts?

@wpsmort

This comment has been minimized.

Copy link
Member Author

commented Apr 10, 2018

Currently the grey box is a text area field. We could change it to a table with one column and a row for each rule. If we did this as a table we could have the ability for users to both delete a row and also edit a row. We'd have a delete icon and an edit icon placed to the left of each row.

We would have to group rows based on the User Agents and we'd have to automatically remove the User Agent row if there are no rules present. For example, I delete the one rule below which means that the User Agent also gets removed:

User Agent: Googlebot
Disallow: /wp-admin/

Is this a practical solution and is there a library we could use that would provide what we need?

@contactashish13

This comment has been minimized.

Copy link
Contributor

commented Apr 11, 2018

I think that's the easiest way to do this.

This also means we would need to track the rules added by each site admin so that they can delete their own rules.

The network admins can edit/delete the rules created by them. But the grey box will contain the fully formed robots.txt.

@contactashish13

This comment has been minimized.

Copy link
Contributor

commented Apr 12, 2018

@wpsmort @michaeltorbert could you comment on the above?

@wpsmort

This comment has been minimized.

Copy link
Member Author

commented Apr 12, 2018

@contactashish13 A site admin should be able to add, edit and delete rules on any site where they are an Administrator. I don't think that means we need to track rules added by each admin user. As an example:

  • We have a multisite with three sites - Site A, Site B and Site C
  • Each site has an admin user - Admin A, Admin B and Admin C
  • These admins can add and delete rules from their respective sites
  • We have an additional admin user Admin Z who is an admin on all three sites
  • Admin Z can add and delete rules from any of the three sites even if they were added by one of the other three admins

Does this make sense? Do you have a reason why this won't work?

@contactashish13

This comment has been minimized.

Copy link
Contributor

commented Apr 13, 2018

@wpsmort what if Admin A wants to delete/edit his own rules? Do we want them to have this ability or only to add and do nothing else? If they should be able to edit/delete their own rules, we need to track who added what rule so that we don't show them someone else's rules to edit/delete

@wpsmort

This comment has been minimized.

Copy link
Member Author

commented Apr 13, 2018

@contactashish13 We want to allow any administrator to make changes to site admin rules. We don't need to track which administrator adds, edits or deletes rules.

If you and I are both admins on the same site then we can both make whatever changes we want. If you want to delete something that I added then that's fine. We only need to track and protect Network Admin rules.

@contactashish13

This comment has been minimized.

Copy link
Contributor

commented Apr 13, 2018

@wpsmort

This comment has been minimized.

Copy link
Member Author

commented Apr 13, 2018

You can only add, edit or delete rules on a site if you have administrator role on that site.

@contactashish13

This comment has been minimized.

Copy link
Contributor

commented Apr 13, 2018

@wpsmort Let me make the scenario clear:

  1. You are the admin for site 1
  2. I am the admin for site 2
  3. Michael is the network admin

If I add rules

  • can you delete them?
  • can I edit/delete them without tracking which rules I added?

If you add rules,

  • can I delete them?
  • can you edit/delete them without tracking which rules you added?

Michael can delete/edit only his rules because those are network wide rules.

@wpsmort

This comment has been minimized.

Copy link
Member Author

commented Apr 13, 2018

If I’m not an Administrator on your site then I don’t have permission to log into your site and therefore cannot change any setting on your site. This doesn’t change if it’s a Multisite. When a user is created for a site in a Multisite network then they only have that role for that site. They can’t log into any other site in the network.

Does that clear things up?

@contactashish13

This comment has been minimized.

Copy link
Contributor

commented Apr 14, 2018

@wpsmort I don't think the current UI makes sense if we are going to introduce another table underneath it for individual rules.

Something like https://datatables.net/examples/api/form.html might be more in line with what we want to do - addition/editing/deleting. Each user can then have the ability to add/edit/delete rows and then submit the whole table. Let me know what you think.

@wpsmort

This comment has been minimized.

Copy link
Member Author

commented Apr 17, 2018

@contactashish13 I agree that a table UI would be better suited to what we want to do so that individual rules can be edited or deleted.

@michaeltorbert

This comment has been minimized.

Copy link
Member

commented Apr 17, 2018

I'm not sure that changing the UI completely is necessary at this point, especially if it adds a lot of development time, new potential bugs, etc.

Ideally we'd keep the same UI, with the addition of being able to delete rules that weren't added by the network admin.

@contactashish13

This comment has been minimized.

Copy link
Contributor

commented Apr 17, 2018

@michaeltorbert

This comment has been minimized.

Copy link
Member

commented Apr 17, 2018

I would think adding a "delete" next to each rule is intuitive.

@michaeltorbert

This comment has been minimized.

Copy link
Member

commented Apr 17, 2018

If we do change the UI, two things we need to keep in mind:

  1. The UI should be clean and intuitive. WP of course has a built in UI that could be used (for posts, etc), or something like the Cart66 data tables UI.
  2. We need to keep in mind security of the inputs. We don't want to open up any new vulnerabilities.
    https://codex.wordpress.org/Validating_Sanitizing_and_Escaping_User_Data
    https://developer.wordpress.org/themes/theme-security/data-sanitization-escaping/
@contactashish13

This comment has been minimized.

Copy link
Contributor

commented Apr 19, 2018

@wpsmort in a multisite there are 3 types of admin users.

  1. Network admin - can manage plugins and thus can add robots rules
  2. Admin of child sites - can manage plugins and thus can add robots rules
  3. Admin of the main site - cannot manage plugins and thus cannot add robots rules

Therefore the visibility rules will apply to only the first 2 types of users.

@wpsmort

This comment has been minimized.

Copy link
Member Author

commented Apr 19, 2018

@contactashish13 I was only aware of the first two. Where did you see about the third?

@contactashish13

This comment has been minimized.

Copy link
Contributor

commented Apr 19, 2018

Real time experience :)

If I login as a network user and create an admin without giving network privileges, the user thus created is a a user that cannot manage plugins. It looks like an admin but doesn't behave like one .

@wpsmort

This comment has been minimized.

Copy link
Member Author

commented Apr 19, 2018

OK, thanks.

@wpsmort

This comment has been minimized.

Copy link
Member Author

commented Jun 5, 2018

@contactashish13 Thanks Ashish, I tested that and confirmed it fixes that bug. I have handed this over to our customer to conclude their acceptance testing.

@wpsmort

This comment has been minimized.

Copy link
Member Author

commented Jun 7, 2018

@contactashish13 The customer reported one bug which appear to be caused by the JavaScript error below:

aioseop_module.js?ver=2.6-1647:875 

Uncaught TypeError: $ is not a function
    at initAll (aioseop_module.js?ver=2.6-1647:875)
    at HTMLDocument.<anonymous> (aioseop_module.js?ver=2.6-1647:869)
    at i (load-scripts.php?c=0&load[]=jquery-core,jquery-migrate,utils,moxiejs,plupload&ver=4.9.6:2)
    at Object.fireWith [as resolveWith] (load-scripts.php?c=0&load[]=jquery-core,jquery-migrate,utils,moxiejs,plupload&ver=4.9.6:2)
    at Function.ready (load-scripts.php?c=0&load[]=jquery-core,jquery-migrate,utils,moxiejs,plupload&ver=4.9.6:2)
    at HTMLDocument.K (load-scripts.php?c=0&load[]=jquery-core,jquery-migrate,utils,moxiejs,plupload&ver=4.9.6:2)

I am able to reproduce this on a brand new site by activating the Robots.txt module and editing a post. I get the JS error in my browser console.

@michaeltorbert

This comment has been minimized.

Copy link
Member

commented Jun 7, 2018

@contactashish13 What was the purpose of removing jQuery here? Removing it is what is causing the javascript conflict error. d412412

@contactashish13

This comment has been minimized.

Copy link
Contributor

commented Jun 8, 2018

@michaeltorbert I changed this to an IIFE instead of exposing all JS functions in the global scope to prevent possible conflicts with other plugins. Thus jQuery is being passed to the entire JS file and the argument is no longer needed by the functions as they will all recognize $. Have done this now. Also, there were too many jQuery(document).ready instances and I've merged all into one. This will cause less confusion.

@michaeltorbert

This comment has been minimized.

Copy link
Member

commented Jun 8, 2018

@contactashish13 Will this mean #1726 is no longer an issue? I'd changed the names of the two functions in the latest version (which will probably cause a merge conflict btw with your latest commit).

@contactashish13

This comment has been minimized.

Copy link
Contributor

commented Jun 11, 2018

@wpsmort

This comment has been minimized.

Copy link
Member Author

commented Jun 12, 2018

@contactashish13 The customer has completed acceptance testing and approved this. We have one final item that needs to be completed as follows:

Remove General Settings, Performance and all other modules from the menu and Feature Manager when in the Network Admin screen.

The only menu items that should be visible at the Network Admin level are Robots.txt and Feature Manager.

The only module that should be visible in the Feature Manager at the Network Admin level is the Robots.txt module.

See screenshot
screen shot 2018-06-12 at 12 55 42 pm
:

@wpsmort wpsmort modified the milestones: 2.8, 2.7 Jun 12, 2018

@contactashish13

This comment has been minimized.

Copy link
Contributor

commented Jun 13, 2018

@wpsmort are these changes specific to the customer or is this behavior for anyone who deploys the plugin?

@wpsmort

This comment has been minimized.

Copy link
Member Author

commented Jun 13, 2018

@contactashish13 This is for all and for both free and Pro versions.

michaeltorbert added a commit to contactashish13/all-in-one-seo-pack that referenced this issue Jun 19, 2018

michaeltorbert added a commit that referenced this issue Jun 19, 2018

Merge pull request #1669 from contactashish13/#1647
Virtual Robots.txt module

wpsmort added a commit that referenced this issue Jun 20, 2018

michaeltorbert added a commit that referenced this issue Jun 21, 2018

remove extra menus (#1742)
* change module class

* change menu file

* change aioseop class

* Add files via upload

* Update menu.php

* Update aioseop_module_class.php

* Fix for incorrect path

#1647
@contactashish13

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2018

@wpsmort reopening this to add test cases. Can you please provide a matrix of rules that a user can provide and the expected results (similar to what you had listed in the spreadsheet) so that I can create test cases around each of them? It would also be great if you can specify multisite/single site next to each rule.

PR: #1803

@wpsmort

This comment has been minimized.

Copy link
Member Author

commented Jul 31, 2018

@contactashish13 I've updated the Google spreadsheet with some rules for unit testing. Let me know if you think we need more test rules. The ones here should cover all cases.

https://docs.google.com/spreadsheets/d/1bRZPhoo4MurkfFHlQo-CucNp4DhASajw6Zmtka-vaRM/edit#gid=973451281

@wpsmort

This comment has been minimized.

Copy link
Member Author

commented Aug 30, 2018

Now that the unit tests have been added (PR #1803 ) this issue can be closed.

@wpsmort wpsmort closed this Aug 30, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.