This library provides functions to normalize, unshorted, and extract URL.
urlcleaner
API uses JDK 8.
URL normalization is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URL into a normalized URL so it is possible to determine if two syntactically different URLs may be equivalent.
To use urlcleaner
in your application, you have to add urlcleaner
in your classpath. urlcleaner is available on Maven Central so you just need to add dependency to your favorite build tool as show below.
For Apache Maven users, please add following to your pom.xml.
<dependencies>
<dependency>
<groupId>com.shekhargulati.urlcleaner</groupId>
<artifactId>urlcleaner</artifactId>
<version>0.4.0</version>
<type>jar</type>
</dependency>
</dependencies>
Gradle users can add following to their build.gradle file.
compile(group: 'com.shekhargulati.urlcleaner', name: 'urlcleaner', version: '0.4.0', ext: 'jar')
import com.shekhargulati.urlcleaner.UrlCleaner;
UrlCleaner.normalizeUrl("shekhargulati.com") // http://shekhargulati.com
UrlCleaner.normalizeUrl("https://www.shekhargulati.com:443") // https://shekhargulati.com
UrlCleaner.normalizeUrl("www.shekhargulati.com") // http://shekhargulati.com
UrlCleaner.normalizeUrl("http://shekhargulati.com/%7Eabout/") // http://shekhargulati.com/~about
UrlCleaner.normalizeUrl("http://shekhargulati.com/hello%5Fabout/") // http://shekhargulati.com/hello_about
UrlCleaner.normalizeUrl("http://shekhargulati.com?lang=en&article=fred") // http://shekhargulati.com?article=fred&lang=en
UrlCleaner.normalizeUrl("http://xn--xample-hva.com") // http://êxample.com
UrlCleaner.unshortenUrl("http://bit.ly/1Wtrl9t"); // http://shekhargulati.com/
// It can also work for multi level shortened URL. The below URL is shortened 4 times
UrlCleaner.unshortenUrl("http://bit.ly/1pwuGdF"); //http://www.bloomberg.com/news/articles/2016-03-17/unmasking-startup-l-jackson-silicon-valley-s-favorite-twitter-persona
final String text = "CloudABI now available for Arch Linux https://nuxi.nl/doc/archlinux/ (cmts https://google.com )";
List<String> urls = UrlExtractor.extractUrls(text);
// urls -> [https://nuxi.nl/doc/archlinux/,https://google.com]
urlcleaner is licensed under the MIT License - see the LICENSE
file for details.