Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

03-bot #11

Open
jakobzhao opened this issue Mar 29, 2021 · 18 comments
Open

03-bot #11

jakobzhao opened this issue Mar 29, 2021 · 18 comments

Comments

@jakobzhao
Copy link
Owner

No description provided.

@jennylee719
Copy link

jennylee719 commented Apr 19, 2021

04/19
Jenny Lee

Winner (2014) highlights the need to attend to the political properties of technologies and challenge the utilitarian view of technologies that is mainly concerned with their usage and technicality. In the process, Winner (2014) introduces the concept of “inherently political technologies,” which he defines as “man-made systems that appear to require or to be strongly compatible with particular kinds of political relationships” (p. 669). These technologies may or not give people much latitude and are aligned with certain political arrangements and interests that are seen as a “practical necessity” (Winner, 2014, p. 674).
On the other hand, Thelwall and Stuart (2005) seem to advocate the perspective that technologies are neutral. They argue that “technology is never inherently good or bad; its impact depends upon the uses to which it is put as it is assimilated into society” (p. 1172). Following this logic, ethics in crawling depend on how people use and practice crawling technologies, rather than how crawling technologies are designed and how they organize social relations. For instance, when setting the criteria for ethical practices, Thelwall and Stuart (2006) focus on the costs and benefits of crawling practices. The utilitarian perspective was also echoed in Li, Wang, and Bhatia's (2016) article where the authors focus on creating a web crawling engine – the PolarHub – that is effective and practical. While these are important considerations, Winner (2014) is wary of an exclusively utilitarian perspective as it can make us “blinded to much that is intellectually and practically crucial” (p. 671). Some lingering questions I had while reading Thelwall and Stuart’s article were the following: Who owns the data on these websites? The users? The owner of the website? Or the owners of the crawlers? Scholars in critical data studies are increasingly directing attention to the politics of platforms and how their purported neutrality enable platforms to avoid accountability while transferring the responsibility to the users (Gillespie, 2010). Following this line of thought, it seems imperative that the conversations surrounding ethical crawling practices also take into consideration the politics that surround data, platforms, and users. In particular, the discussion on data privacy was only briefly mentioned in Thelwall and Stuart’s article, and this brevity is telling of the politics of platforms and the vulnerable positions of users.
Borrowing the words of Winner (2014), while these developments seem to progress in a temporal, logical, and inevitable manner, technologies do not happen in a vacuum and are instead “ways of building order in our world” (p. 673). In the process of structuring society “different people are situated differently and possess unequal degrees of power as well as unequal levels of awareness” (Winner, 2014, p. 674). Therefore, to understand the systems that underly sociotechnical systems, we need to closely examine the moral arguments surrounding practical necessity and develop a keen eye to understand both intractable and flexible technological systems and their ramifications.

References

Gillespie, T. (2010). The politics of ‘platforms’. New media & society, 12(3), 347-364.
Li, W., Wang, S., & Bhatia, V. (2016). PolarHub: A large-scale web crawling engine for OGC service discovery in cyberinfrastructure. Computers, Environment and Urban Systems, 59, 195–207. https://doi.org/10.1016/j.compenvurbsys.2016.07.004
Thelwall, M., & Stuart, D. (2006). Web crawling ethics revisited: Cost, privacy, and denial of service. Journal of the American Society for Information Science and Technology, 57(13), 1771–1779. https://doi.org/10.1002/asi.20388
Winner, L. (2014). Do artifacts have politics? In Philosophy of Technology: The technological coniditon: An anthology (2nd ed., p. 12). John Wiley & Sons, Inc.

@stevenBXQ
Copy link

04/19
Steven Bao

While the large-scale web crawling engine proposed by (Li et al., 2016) has significant practicality, many potential ethical issues are left to be appropriately addressed, yet surprisingly not mentioned at all by the authors. Therefore, acknowledging and appreciating the contribution Li and her team have made to GI science, I would like to raise a discussion around the ethical concerns regarding such a web crawling architecture.

Firstly are the potential problems with copyright and privacy, which are topics highlighted by (Thelwall and Stuart, 2006). Even when we assume that all information displayed on web pages is in the public domain, there are hidden, possibly unintended for publishing contents lie in the webpage source codes. Because a web crawler captures information through analyzing websites’ source code, it may collect information without the owner’s approval for public use. Such information could be personal privacy, copyrighted content that the owner does not plan to share with others, or intentionally hidden information. An automated web crawler will not distinguish this from data that are intended for publishing.

Two other types of issues described by (Thelwall and Stuart, 2006) are denial of service and cost. Although nowadays bandwidth or Internet traffic quota limits, which were important considerations, would usually not be problematic to large websites anymore, web crawling may still exert impacts on small web servers, especially when the server needs to transfer large amounts of data or perform intensive computational tasks for the web services. (Li et al., 2016) particularly pointed out that PolarHub will perform an extensive search on the web, which means every website, regardless of its server’s capability, will be crawled. Such unanticipated accesses may consume the limited computational capacity or data quota that the data owner expects based on the intended uses, potentially leading to the denial of service or unexpected costs. Additionally, such results may also negatively influence the intended users of the web services.

While I sincerely appreciate the tremendous work that Li et al. (2016) have done for democratizing geospatial data, it is also important to “carefully consider potential impacts on all those affected by decisions made during design and implementation,” as suggested by Anderson in 1992.

References:
Anderson RE (1992) ACM code of ethics and professional conduct. Communications of the ACM 35(5): 94–99. DOI: 10.1145/129875.129885.
Li W, Wang S and Bhatia V (2016) PolarHub: A large-scale web crawling engine for OGC service discovery in cyberinfrastructure. Computers, Environment and Urban Systems 59: 195–207. DOI: 10.1016/j.compenvurbsys.2016.07.004.
Thelwall M and Stuart D (2006) Web crawling ethics revisited: Cost, privacy, and denial of service. Journal of the American Society for Information Science and Technology 57(13): 1771–1779. DOI: 10.1002/asi.20388.

@larissa-soc
Copy link

With the rapid growth of complexity and accessibility of the digital space, humanity is faced with yet another frontier; and with it, endless possibilities for innovation and inevitably blind engagement with ethical boundaries yet to be established. Li (2016) and Thelwall (2006) both grapple with the expansive and complex digital field, respectively discussing technologies that oxymoronically function as centralization of decentralized information. Li introduced PolarHub as a “meta-search-based seed selection and pattern-matching based crawling strategy [which] facilitates the rapid resource identification and discovery through constraining the search scope on the Web. Also, PolarHub introduces the use of advanced asynchronous communication strategy, which combines client-pull and server-push to ensure high efficiency of the crawling system” (Li, 2006, p.195). From what I understand, this means that PolarHub is essentially a sophisticated crawler that operates within the digital space between individual processing units. In doing so, can centralize/bottleneck geospatial data by acting in a decentralized fashion (i.e., crawling around cyberspace). This kind of advancement is undoubtedly an achievement, and it is clear that PolarHub has made a positive difference in the field: “currently, the PolarHub system is up and running and is serving various scientific community that demands geospatial data.” (P. 195).
But geospatial data is not the only kind of data people are interested in; retailers, browser add-ons, and personal users are all utilizing crawlers to access the wealth of information available on the Web (Thelwall, 2006). I was surprised to learn that a crawler can inundate a website with requests and that the site owner would have to pay for it in the form of additional bandwidth or experience delays in serious inquiries resulting in a denial of service. Furthermore, I had not considered the issue of copyright infringement (Thelwall, 2006, p. 1775). The Robbots.txt protocol proposed in the Thelwall piece seems to call for simply respecting the instruction to crawlers not to crawl the site.
I may be speaking out of turn from lack of expertise, but it seems that the PolarHub is not a crawler that is necessarily running into the same kind of issues that Thelwall is trying to address. A crawler hunting for geospatial data will probably contact sites that are A) meant to dispense data and B) probably aren’t showing up in the average retail crawler's results. So, is there really an ethical obligation for researchers in the context of PolarHub to follow the Robot.txt protocol? In asking this question, I am reminded of Don Ihde. What kind of technology is the crawler? From my perspective, the problematic crawlers criticized by Thelwall are, for many, either background technologies or some type of relation of embodiment, pulling up desired results, making suggestions etc. without our knowledge, maintaining the environment we are accustomed to. For example, the way some of my family experiences the cyber world is through Google, which uses web crawlers in its search engine. If cyberspace is a world in its own right, should we not think of the web crawler in this case as a kind of embodiment relation in which “human beings take technological artifacts into their experiencing” (Verbeek, 2001, p. 127)? One could say it is more of a hermeneutic mediation because the world is seen through it, but I argue that you don’t have to interpret a crawler's results in the case of Google, it takes you right to the content! In contrast, PolarHub reads the metadata of the individual nodes it crawls around, the internal programming makes a judgment, and returns various representations of the results that the user can then read and interpret.

Hours could be spent discussing the kind of technology a web crawler is, or if each crawler should be considered individually, but my main point is that from where I stand these things are difficult to classify philosophically, a difficulty that should translate to our moral considerations of engaging with it.

Works Cited
• Verbeek, P.P., 2001. Don Ihde: The Technological Lifeworld. In American Philosophy of Technology: The Empirical Turn. (pp. 119-146). Indiana University Press.
• Li, W., Wang, S. and Bhatia, V., 2016. PolarHub: A large-scale Web crawling engine for OGC service discovery in cyberinfrastructure. Computers, Environment and Urban Systems, 59, pp.195-207.
• Thelwall, M. and Stuart, D., 2006. Web crawling ethics revisited: Cost, privacy, and denial of service. Journal of the American Society for Information Science and Technology, 57(13), pp.1771-1779

@shuangw1
Copy link

Winner’s paper (1986) discussed a central concept here is that “technical things have political qualities”. The author used several examples to illustrate this idea: the bridge and roads example in long island and the farming example in California. All show that technology might benefit certain people who have privileges over others, and this gap might become more expansive because of the extensive use of technology today.
While Thelwall and Stuart discussed (2006) a different opinion, “technology is never inherently good or bad, its impact build upon the uses to which it is put it is assimilated into society” (du Gay, Hall, Janes, Mackay, & Negus, 1997). The data ethics in this article triggered a lot of thinking, for example, in the realm of web crawling. Nowadays, more people can have access to web crawlers, therefore controlling the use of data and the consequences? It reminds me of several examples I heard, for example, there was the research that mapped out points based on some web-based exercise data. And it accidentally revealed a military base on the earth (map) because the soldiers usually exercise and run on the field daily. The researcher might not be aware of what consequent it will bring when they start this mapping event.
Finally, Li’s paper (2016) talks about polarhub, a districted or decentralized system that allows easy collaboration, supports automatic discovery of “deep” geospatial data on the web. They compared this service with some widely known service engines such as ESRI’s platform. I wonder after the establishment of this platform, what are the challenges in the maintenance of this service? (I heard about it the first time, and I might try it, I have a lot of challenges finding geospatial data in my study region in rural China. Not only geospatial data, but all the data in less developed regions are also hard to find, and I think that leads to some research bias. )
All three papers reveal the debates and concerns in a time sequence (1980’s, 2000’s, and 2010’s), which reflects the technological development of our society today.

@nvwynn
Copy link

nvwynn commented Apr 20, 2021

Given my interest in food geographies, the vignette I was most drawn to was in Winner’s paper: that of the mechanized tomato harvester. Though the intention of the UC researchers was neither to concentrate the tomato industry nor to homogenize tomato breeding, each of these consequences nonetheless unfolded. Though, at first blush, this issue may seem quite unrelated to the question of ethical web crawlers as taken up by Thelwall and Stuart, the essential takeaways are the same: technology, and the requisite research, produce both predictable and unintended consequences.
These consequences of technology can intentionally “embody a systematic social inequality,” (p 670) as with the low bridges of 1920s New York and they can create “moral problems that lack clear solutions using existing frameworks and that require considerable intellectual effort to unravel (Johnson, 2004)” (Thelwall and Stuart, 2006) Or, as Winner argues, are not sufficiently explained or addressed either through frameworks of technological determinism nor that of the social determination of technology.
A recurring theme through both papers is that it is very difficult to implement ethical frameworks after a technology comes into existence: how do we decide in what manner to move forward with “specific features in the design or arrangement of a technical system after the decision to go ahead with it has already been made?” (p 672) One suggestion given as, “One way to avoid unintentional harm is to carefully consider potential impacts on all those affected by decisions made during design and implementation.”

One question I am left to ponder is inspired by Denis Hayes’ assertion that related to nuclear reactors. Is there a relationship between authoritarianism or democratization that corresponds to the spatialization of such systems in terms of their centrality or dispersion?

@reconjohn
Copy link

Technical systems are interwoven with politics. For example, artifacts that were designed in a certain way or systemic contexts could be tools to present power or establish patterns of power. Furthermore, social and economic systems where the technical systems are embedded are significant to the implication of the technical systems. However, certain technologies are inherently political in that they require specific social structures. For example, nuclear energy is inherently autocratic requiring a specific social structure while rooftop solar is democratic and decentralized in that people can take advantage of equitable clean energy supply. The equitable energy supply will still depend on the social structure which makes it possible. In short, we can not ignore the social or economic contexts where the artifacts are embedded (Winner, 2014).

With regard to social structure, ethics are another issues to consider for the implementation of technical systems. For example, Thelwall and Stuart (2006) discussed four ethical issues of web crawling: denial of service (due to the high demand), cost (using up the service's bandwidth), privacy (related to personal information), and copyright (without the owner's permission). The authors favor decision-making based on cost-benefit analysis from the utilitarian perspectives by pointing out the ethical merits of technologies depend on the uses in society. Utilitarianism and situation ethics are relativistic in that from a different perspective, ethical judgment changes as opposed to deontological ethics where it is considered that absolute right and wrong exist (Thelwall and Studart, 2006). This view is similar to Winner (2014) who sees technical systems are understood in the social structure and political power.

References
Thelwall, M., & Stuart, D. (2006). Web crawling ethics revisited: Cost, privacy, and denial of service. Journal of the American Society for Information Science and Technology, 57(13), 1771–1779. 
Winner, L. (2014). Do artifacts have politics? In Philosophy of Technology: The technological coniditon: An anthology (2nd ed., p. 12). John Wiley & Sons, Inc.

@skytruine
Copy link

skytruine commented Jan 16, 2022

Yifan Sun
This week’s three papers follow a specific logic: pure specific tech (Li et al., 2016)-> reflection towards specif tech (Thelwall and Stuart,2005) -> general discussion towards the relationship between tech and society (Winner, 2014). Li’s paper described the architecture and performance of a large-scale geospatial web crawler: PolarHub which can be considered as a solution to the hardship in collecting and using decentralized geospatial data. To me, the article can serve as a demo or template when writing system design/ engineering articles. Besides, a passage from the article resonated strongly with my own experience:
“Most scientific analysis is conducted by long-tail researchers, for whom data discovery may be come a nightmare because they do not have an accessible tool available that can rapidly locate needed data on the internet (Heidorn, 2008)”. I have to say, at the very beginning of my research career, I have a hard time getting detailed POI, road network, and administrative division data in China. Data are just there, but if you are not familiar with them and related downloading techs, it will take a huge amount of time to get access to them. When I became proficient in acquiring geographic data, I packaged my knowledge and skills into a free open source software: OSpider. In three years, without deliberate publicity. OSpider has accumulate more than 2000 users, who are mainly practitioners and researchers in urban planning and GIS. I think the packaging and integration of necessary crawlers into small and easy-to-use software can reduce the work of researchers in data acquisition and data preprocessing. Although this kind of project itself does not generate much scientific research value, its indirect scientific research value is immeasurable.

Thelwall and Stuart’s paper discuss four kinds of issues web crawler may raise: denial of service, cost, privacy, copyright and offers a crawler usage guideline at the end of the paper. When it comes to Thelwall and Stuart (2005), they argued that at least some techs have inner politics and the most important section of the paper should be “Inherently Political Technologies”. Until now, to me, technology has had a circular relationship with society. Technology is neutral when viewed in isolation from society. However, when technology is combined with society, it is inevitably manifested in the use and practice of technology. The use and practice of most technologies are flexible and produce different political and economic outcomes in different societies, sometimes beyond the intended purpose of the technology. At the same time, some technological practices have a natural political nature: centralized or decentralized, etc. .

@gracejia513
Copy link

This week’s reading material presented a discussion on the morality and ethics of technological development ranging from the 1980s to 2010s. Winner, in his work, discussed the intricate interplay between the technical artifacts and the social or economic system in which they are embedded. Winner mentioned three theories: (1) social determination of technology believes that technology develops, unmediated by external influence, and molds the society “to fit its patterns.” ; (2) the social determination of technology states that technological things’ politics all depends on how the society takes advantage of them; (3) theory of technological politics pays attention to technical artifacts’ characteristics and the meaning of those characteristics. The author stated that technical things have political qualities.

Several examples in this work came back to me even stronger when Thelwall and David Stuart brought up several philosophical theories such as utilitarianism and deontological ethics. Though Thelwall and Stuart focused primarily on ethics in crawling technologies, one can easily broaden the discussion to general practices in technology. Indeed, utilitarianism ethic is relativistic, and the standard under which people decide whether it is ethical would also develop over time. Moses’ low bridges and McCormick’s molding machines have shown that technology has the power to put people from some segments of society in better shape and leave others disadvantaged. Moreover, it is worthwhile to define the foothold the ethic argument is based on clearly. While Thelwall and Stuart defined four ethical issues in web crawling: denial of service, cost, privacy, and copyright, it did not mention whether the website “owns” the data. In fact, the argument warning crawlers that there are consequences by flooding the website with the massive requests can be applied to the website itself if they do not own the data it presents. Did the website obtain any permission (or any waiver) for collecting and presenting such data? Who would be held responsible if these data landed in the wrong hands? The crawler or the entity who makes the crawling possible? These open questions are yet to be discussed.

It is exciting to see that researchers such as Li have made a relentless effort to develop an effective, reliable, and efficient cyberinfrastructure platform to facilitate research from multiple fields. Thinking about such development from an ethical lens would further perfect the framework and make the product benefit all of us.

@JerryLiu-96
Copy link

This week's reading materials introduced a newly proposed cyberinfrastructure for geospatial data web crawling, some ethical discussions of the web crawling, and a classic but insightful article reflecting on the relationship between technologies and society.

Last week's reading material includes an article which proposed an architecture for distributed geospatial data management, this week we are given an opportunity to look into the distributed acquirement of geospatial data using web crawling. The necessity of web crawling geospatial data emerged from the proliferation of geospatial resources that are shared and made publicly available on the Web, which makes the identification of the web signature of voluminous geospatial resources a major challenge. Thus, (Li, Wang and Bhatia; 2016) designed a new a web scrawling platform for "responsible" web crawling, topic of which will be covered later. They restrained the search scope and scrawling depth. They will further the system by enabling a chain of
workflow to support complex scientific analysis, and enabling non-English webpage scrawling.

An earlier article expressed concerns over the abuse of web crawling. (Thelwall and Stuart, 2006) prophetically argued that "delicate human judgment is required for each individual case, with verdicts likely to change over time. Decisions can be based upon an approximate cost-benefit analysis, but it is crucial that crawler owners find out about the technological issues affecting the owners of the sites being crawled in order to produce an informed assessment." They were concerned that the web crawling will become prone to abuse as it becomes more accessible to the general public. This is why I referred to (Li, Wang and Bhatia; 2016)'s platform a "responsible" work since it is aware of the impact the system may have on web owners and commits on a sustainable web crawling. But I am still concerned if the web crawling will be abused because (Thelwall and Stuart, 2006) said they are worried because the web crawling no longer exclusively belongs to researchers.

Last but not least, we read the article about the relationship between technology and society, and we know that the technology will have a significant impact on the society no matter whether the scientists planned to do so in the first place. It is possible that scientists designed the technologies purposefully to divert the society, or the technologies "find" their ways to reshape the society.

As a result, I think the society should be more actively involved in the development and regulation of the emerging technologies, a democratic approach should be adopted. Utilitarianism approach would help, but we cannot fully rely on that. Unlike fax advertising and email spam, the identity of web crawler is more"anonymous" to the victim. As a result, the web crawler have no reputation concern. The web owner should do more to prevent their web from being crawled, but that may also discourage some benign use of web crawling. A more democratic regulation of web crawling would be appropriate, because any regulation of the technology would undoubtedly have impact on the general public.

@jakobzhao
Copy link
Owner Author

Quotes from Grace: "Did the website obtain any permission (or any waiver) for collecting and presenting such data? Who would be held responsible if these data landed in the wrong hands? The crawler or the entity who makes the crawling possible? These open questions are yet to be discussed."

@jakobzhao
Copy link
Owner Author

Qutoes from Yifan: " Technology is neutral when viewed in isolation from society. However, when technology is combined with society, it is inevitably manifested in the use and practice of technology. "

I think Technology is always intertwined with society, but scholars may determine the boundaries in between the two.

@Jxdaydayup
Copy link

Jxdaydayup commented Jan 30, 2022

It is my first time to read a paper that systematically introduces a large-scale web crawling engine (Li, Wang, and Bhatia, 2016). Both Service-oriented Architecture (SOA) and Data Access Object (DAO)-based software are new concepts to me. Although I may have encountered them through practices before, but I was not mindful in the past until reading this paper, throughout which their definition and logistics become clearer to me. It is interesting that the authors did not discuss anything about ethical issues or social implications of web crawling. It might be due to the fact that these issues are not their main focus. Moreover, this paper is published in a journal that attends to cutting-edge and innovative computer-based research on urban systems, systems of cities, and built and natural environments, and privileges the geospatial perspective. This implies that ethical issues and social implications of geospatial technology are not of main interest to this journal and the relevant academic community. On the other hand, I am thinking that some scholars even do not weigh their whole research on these issues, probably because their academic backgrounds (e.g., the education they have received) shape their ways of thinking which unintentionally ignore these issues, albeit important to other scholars.

By comparison, another two papers (Thelwall and Stuart, 2006; Winner, 1986) revisited web crawling ethics and provided deep insight into politics of artifacts. I highly appreciate Thelwall and Stuart’s clarification of the word ethical and related concepts including computer ethics, research ethics, and web crawling issues. Although it is unclear about the reliability, some analyses show that two-thirds of the data crawling is malicious, and this proportion is still rising. Malicious crawling infringes on the rights and interests of others (e.g., the owner of the website) and the freedom of operation. Malicious bots can plunder resources and undermine competitors. They can be abused to crawl content from one site and post that content to another site without revealing the data source or link, which is viewed as an inappropriate method that helps illegal organizations build fake websites, create fraud risks, compromise knowledge, steal property rights, and trade secrets. While I am not sure about whether and the extent to which US law regulates malicious bots, as far as I know, it is argued that inappropriate access, collection, and interference of web crawlers should be regulated by law in China. At present, China’s existing laws to regulate web crawlers mainly focus on the relevant provisions of the criminal law related to computer information system crimes. The criminal law regulates the data scraping behaviors that have a serious impact on the target website and are harmful to society. If the perpetrator violates the relevant provisions of the criminal law and collects the data stored, processed, or transmitted by the general website through web crawler access, it may constitute the crime of illegally obtaining computer information system data in the criminal law. If the illegal control behavior is implemented in the process of data capture, it may constitute the crime of illegally controlling computer information systems. In addition, due to the use of web crawlers to interfere with the function of the target website, resulting in increased access traffic and slow system response, which affects normal operations, it may also constitute a crime of destroying computer information systems.

@S-Arnone
Copy link

S-Arnone commented Feb 14, 2022

In reflecting on Thelwall and Stewart's "Web Ethics Revisited", I found it notable that the issues raised by the authors - "denial of service, cost, privacy, and copyright" (Thelwall and Stewart 2006, 1773) - are more resemblant of our structural constraints than organic human organization. That is to say: cost, privacy (related to the state), and copyright are issues in the mediation of production and organization, rather than parts of them. Thinking back to Feenberg's article on subversive rationalization, we might note that this represents a form of anti-determinism insofar as the development of technology itself is a process which interacts intimately with our culture and its horizons.

Winner, who picks up this idea in discussing the design process itself might note that a technological bias is evident throughout the development and operation of crawlers, from the costly development of APIs to the application and storage of crawled data. It is difficult to imagine what web crawlers could be outside of the present framework but the thought of freer information (from a monetary standpoint to an access one) provokes optimism. Thinking again back to Feenberg's discussion of a cultural horizon in week 2, one might note that the very ideologies onto which technology is being performed are simultaneously undermined by the pulling vision of alternative futures and liberated technologies. Drawbacks can never be avoided, as Winner noted, “many of the most important examples of technologies that have political consequences are those that transcend the simple categories “intended” and “unintended” altogether." (Winner 1986, 671) But this constant tension is of course the very force driving technological and social development - the future is referred to as such, naturally, because it is rooted to the present and its past.

Li et al.'s article on PolarHub seems to exemplify this basic principle of tension well; as its very starting point references the tension generated by a glut of geospatial resources, which while undermining some aspects of research today opens the door for substantial development. And, stepping aside from questions of Winner and Feenberg, I thought it should be noted that PolarHub can be found at http://cici.lab.asu.edu/polarhub3/. In taking time to explore the crawler, which is now in its third iteration, I found it interesting that data inequalities were reflected so strongly in PolarHub's crawled results. Note for example, that Andorra (with a size of 180 sq. mi. and a population of 80,000) maintains 33 times the number of results generated from Armenia, and 76 times the number of results generated from Kenya. Of course, one might object, PolarHub's results are primarily oriented toward the collection of data related to climate science. But that again only goes to show the point that data production is primarily oriented toward highly-developed resource rich countries - as research on global trends in categories as broad as "Agriculture" and "Human Dimensions" are overwhelmingly focused on studying trends that affect powerful and wealthy states and their populations.

References
Thelwall, M., & Stuart, D. (2006). Web crawling ethics revisited: Cost, privacy, and denial of service. Journal of the American Society for Information Science and Technology, 57(13), 1771–1779.
Winner, L. (2014). Do artifacts have politics? In Philosophy of Technology: The technological coniditon: An anthology (2nd ed., p. 12). John Wiley & Sons, Inc.

@cpuentes12
Copy link

If I could have done these readings over again, I would have started with the Thelwall & Stuart article for its upfront and clear definition of web crawling and related ethical concerns. Web crawling is a term I've encountered sparsely before but never engaged deeply with, so I found almost all the information in these readings to be new content. For this reason, I appreciated the concise way Thelwall & Stuart explain what web crawling is, the four types of issues it may raise, and why a set of ethical use guidelines is needed. The authors argue that from a philosophical moral standpoint, “[t]echnology is never inherently good or bad; its impact depends upon the uses to which it is put as it is assimilated into society” (1772). This implies that not only are users of this technology responsible for the harm it may cause, but that the ways in which they use it, collectively, direct how that technology is adopted and viewed by the rest of society as well.

I found the Li, Wang & Bhatia article to be rather dense and written for a niche audience more familiar with web crawling jargon than me. As I understand it, the gist of the paper is that it elaborates on PolarHub as a tool able to conduct large-scale web crawling across different geospatial data and service resources, which addresses the emerging problem of identification of web signatures from those myriad sources.

Finally, Langdon Winner argues that technologies are not neutral tools, but rather have inherent political implications that affect power relations, social structures, and cultural norms. I was drawn to his use of the term "technological somnambulism" to refer to the uncritical acceptance of technologies by society without fully understanding their implications, and found his use of examples of the unintended consequences of this phenomenon to be powerful (particularly the voting machine example). I tend to agree that technologies are often designed with certain values embedded in them, which reinforce existing power dynamics. These sentiments are at odds with Thelwall & Stuart’s claims that technology itself is neutral, and I wonder how different views about the bias/objectivity of technology influence the way we might imagine its uses in the future, and how to govern those uses.

Li, W., Wang, S. & Bhatia, V. (2016). PolarHub: A large-scale Web crawling engine for OGC service discovery in cyberinfrastructure. Computers, Environment and Urban Systems, 59, 195-207.
Thelwall, M., & Stuart, D. (2006). Web crawling ethics revisited: Cost, privacy, and denial of service. Journal of the American Society for Information Science and Technology, 57(13), 1771–1779.
Winner, L. (2014). Do artifacts have politics? In Philosophy of Technology: The technological condition: An anthology (2nd ed., p. 12). John Wiley & Sons, Inc.

@lizhip21
Copy link

I find Winner's piece very thought provoking. It reminds me of the age old argument "guns don't kill people, people do", despite it arguable makes it easier to kill. The author made good arguments about the intended or unintended sociopolitical consequences of technologies. It reminds me of our discussion last week and the theory of technological indeterminism brought up in Feenberg's paper. If we combine these two frameworks, one can argue that the course of the development of our modern technology reflects the past political and social struggles behind them.

Thelwall and Stuart's article on web crawlers is an informative overview of web crawling principles and ethics. What I find most interesting from this article is how a technology, once passed into the public domain, become harder to predict and regulate. In the case of web crawlers for example, the robot.txt protocol is simply an agreement between ethical and well-behaved developers and website owners. When irresponsible users can freely choose to ignore, and inexperienced website owners may not even be aware of its existence. I can't help but think about this in the context of powerful generative AIs we have, and there being no real good way to detect or restrict its use in producing and spreading misinformation in the hands of irresponsible parties.

I don't really have much to say about Li, Wang and Bhatia's work on PolarHub. To me it is aiming at too specific of an audience and the style of writing makes it rather unfriendly to read and inaccessible for some, including me.

@yohaoyu
Copy link

yohaoyu commented Apr 19, 2023

Thinkpiece: Bot - GEOG 595 Humanistic GIS (SPR 2023)

The readings for this week focus on the bot, which is an important tool for doing large-scale repetitive tasks. Starting from the first step, data collecting, Li et. al. (2016) introduce us to the large-scale web crawling framework and engine for geospatial data, PolarHub. The paper is scientific style without much consideration of social impact but we should not expect a deep consideration of the impact of technology on society within a 10k-word article.

Thelwall and Stuat (2006) discuss the ethical aspect of web crawlers, the most commonly used type of bot system. The most interesting point for me is ‘if this kind of crawling becomes problematic, then some form of professional self-regulation (i.e., a social contract) or legal framework would be needed.’ Since it was written more than 15 years ago, I’m curious whether there is an ethical code or regulation for crawling now. From the paper, no at that time, but why? The complexity of the Internet and advanced technologies barrier us from properly regulating them?

More general to technology and society, Winner’s (1986) paper asserts that technologies have politics rather than being neutral, which is similar to Feenberg’s (1992) idea. He believes that technology can shape our social, economic, and political patterns, and that technology is deeply embedded in the social context.

Reference:

  • Li, W., Wang, S. & Bhatia, V. (2016). PolarHub: A large-scale Web crawling engine for OGC service discovery in cyberinfrastructure. Computers, Environment and Urban Systems, 59, 195-207.
  • Thelwall, M., & Stuart, D. (2006). Web crawling ethics revisited: Cost, privacy, and denial of service. Journal of the American Society for Information Science and Technology, 57(13), 1771–1779.
  • Winner, L. (2014). Do artifacts have politics? In Philosophy of Technology: The technological condition: An anthology (2nd ed., p. 12). John Wiley & Sons, Inc.
  • Feenberg A (1992) Subversive rationalization: Technology, power, and democracy 1. Inquiry 35(3–4): 301–322. DOI: 10.1080/00201749208602296.

@amabie000
Copy link

I was glad to read Thewall and Stuart (2006) piece on the ethics of web crawling, particularly as this moment of internet crawling AI chatbots has captured the public’s attention and drawn more people to wonder about the ethics of that enterprise. The authors provide a useful foundation for the complexity inherent in ethical considerations (what is ethical for one, may not be so for another) and the state of computer ethics at the time of writing. This is a topic of interest to me, but not one that I have yet spent much time with. One notable piece was the acknowledgement that while technology is neither inherently good nor bad, some technologies have the benefit of being “born” into a field, such as medicine, that already has deep, rich ethical and legal guidelines. Meanwhile, other technologies, such as web crawlers, come into the world through spaces that lack ethical standards and it seems to be somewhat of a scramble for ethical and legal considerations to catch up to these technologies and their uses. It was also instructive to learn that the various ethical internet research reports did not “allude to automatic data collection” (1773) and to see no mentions of the excellent points that Chase brought to the conversation last week regarding the impacts these technologies have on the environment. The author focused on web crawler issues of denial of service, cost, privacy, and copyright. I must say that I am deeply concerned at the approach to privacy as an ethical issue, which seemed to focus primarily on website owners and fallback on the idea of the internet as a public domain. To me, the ‘public’ aspect of web based information is not a sufficient stand-in for the extractive ‘anything goes’ attitude many seem to hold, particularly around social sharing sites and platforms. Li, Wang, and Bhatia (2016) built a platform, PolarHub, specifically to enable the discovery of disparate geospatial datasets through a web crawler. While much of the technical aspects of this paper were out of reach for me, I get the sense that PolarHub draws geospatial data primarily from OGC (Open Geospatial Consortium) partners who have already given consent of some sort. I still wonder if the geospatial data these partners hold was ethically derived in the first place. Winner (1986) draws attention to the political lives of technological artifacts. In this argument, the notion that the “things we call ‘technologies’ are ways of building order in our world” (672) was a power perspective to start from in making technological artifacts political, what Winner identifies as “arrangements of power and authority in human associations… [and] the activity that take place within those arrangements” (669).

Li, W., Wang, S. & Bhatia, V. (2016). PolarHub: A large-scale Web crawling engine for OGC service discovery in cyberinfrastructure. Computers, Environment and Urban Systems, 59, 195-207.

Thelwall, M., & Stuart, D. (2006). Web crawling ethics revisited: Cost, privacy, and denial of service. Journal of the American Society for Information Science and Technology, 57(13), 1771–1779.

Winner, L. (2014). Do artifacts have politics? In Philosophy of Technology: The technological condition: An anthology (2nd ed., p. 12). John Wiley & Sons, Inc.

@cpuentes12
Copy link

The Zhang, Zhao, Tian & Chen article explores the role and implications of geospatial big data in the era of "post-truth", in the context of the Standing Rock resistance movement against the construction of the Dakota Access Pipeline. This study looked at location spoofing as an entry point for discussion about the nature of truth in the current digital age, as well as a tool for sovereignty by protestors on the ground and in remote solidarity. I actually checked in at Standing Rock on Facebook when this movement was happening in 2016 and remember thinking that it represented a new turn for activism; people leveraging social media technology could support resistance that was physically happening hundreds or thousands of miles away from them. This ties in nicely with the conversations we've been having about Feenberg and Winner's ideas on the social and political power of technology, and presents a great example of a case in which the oppressed were able to use those tools to challenge the dominant hegemony (the pipeline/big energy). I was interested to see the statistical breakdown of where check-ins were coming from as well, which spanned well beyond just the US indicating global interest and involvement in what was initially just a local tribal affair.

One question/concern I have about this type of activism though is if it has the potential to replace community-based and grassroots efforts in person. For instance, if someone who did have the means to go to Standing Rock and offer their labor and presence in person knows that they can contribute to the cause by just checking in on Facebook, will the number of protestors at actual sites of conflict diminish? Or, like we saw in Bo and Xu’s previous article about the cryptoplace memorial, is it just a parallel venue for participation that allows even more access to these movement spaces?

Perhaps I didn’t read the Kwan piece closely enough, but to me it seems rather apparent that neighborhood health studies are impacted by other variables, such as interactions with friends, families, and peers in various nonresidential contexts, and that these variables can change over space and time. Thus I was a bit confused by the severely technical way the author writes about the topic, and feel it would be a much more effective piece if it were simplified.

Finally, I thought the MacEachren et al. piece about geospatial information uncertainty was a great read namely because it triggered new ways of thinking about how uncertainties are perceived in different contexts. Specifically, as I engage with non-academics in climate change work, it can be difficult to communicate what statistical uncertainty is, and why it exists. The way this paper plays with different ways these uncertainties may be visualized was interesting and useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests